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Abstract 

How can we generate realistic networks? In addition, how can we do so with a mathematically 
tractable model that allows for rigorous analysis of network properties? Real networks exhibit a 
long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for 
the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over 
time. Current network models and generators either fail to match several of the above properties, are 
complicated to analyze mathematically, or both. Here we propose a generative model for networks 
that is both mathematically tractable and can generate networks that have all the above mentioned 
structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker 
product, to generate graphs which we refer to as "Kronecker graphs". 

First, we show that Kronecker graphs naturally obey common network properties. In fact, 
we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker 
graphs can effectively model the structure of real networks. 

We then present KronFit, a fast and scalable algorithm for fitting the Kronecker graph gen- 
eration model to large real networks. A naive approach to fitting would take super-exponential 
time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker matrix 
multiplication and by using statistical simulation techniques. 



Experiments on a wide range of large real and synthetic networks show that KronFit finds 
accurate parameters that very well mimic the properties of target networks. In fact, using just 
four parameters we can accurately model several aspects of global network structure. Once fitted, 
the model parameters can be used to gain insights about the network structure, and the resulting 
synthetic graphs can be used for null-models, anonymization, extrapolations, and graph summa- 
rization. 



Keywords: Kronecker graphs, Network analysis, Network models, Social networks, Graph gen- 
erators, Graph mining, Network evolution 
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1. Introduction 



What do real graphs look like? How do they evolve over time? How can we generate synthetic, but 
realistic looking, time-evolving graphs? Recently, network analysis has been attracting much inter- 
est, with an emphasis on finding patterns and abnormalities in social networks, computer networks, 
e-mail interactions, gene regulatory networks, and many more. Most of the work focuses on static 
snapshots of graphs, where fascinating "laws" have been discovered, including small diameters and 
heavy-tailed degree distributions. 

In parallel with discoveries of such structural "laws" there has been effort to find mechanisms 
and models of network formation that generate networks with such structures. So, a good realistic 
network generation model is important for at least two reasons. The first is that it can generate 
graphs for extrapolations, hypothesis testing, "what-if" scenarios, and simulations, when real graphs 
are difficult or impossible to collect. For example, how well will a given protocol run on the Internet 
five years from now? Accurate network models can produce more realistic models for the future 
Internet, on which simulations can be run. The second reason is more subtle. It forces us to think 
about network properties that generative models should obey to be realistic. 

In this paper we introduce Kronecker graphs, a generative network model which obeys all the 
main static network patterns that have appeared in the literature (Faloutsos et al. . 19991: 1 Albert et al 



1999 : lChakrabarti et alll2004l : lFarkas et aUl2001l : lMihail and Papadimitrioul . l2002l :IWatts and Strogatz . 
1998 ). Our m odel also obeys recently discovered temporal evolution patterns ( Leskovec et all 



2005a 12007 af). And, contrary to other models that match th i s combination of network properties (as 



for example. (IBu a nd Tow sleyl 120021 : iKlemm and Eguflua . 120021 : IVazquea. |2003|; iLeskovec et al 



2005b; IZheleva et all 12009m . Kronecker graphs also lead to tractable analysis and rigorous proofs. 



Furthermore, the Kronecker graphs generative process also has a nice natural interpretation and 
justification. 

Our model is based on a matrix operation, the Kronecker product. There are several known 
theorems on Kronecker products. They correspond exactly to a significant portion of what we want 
to prove: heavy-tailed distributions for in-degree, out-degree, eigenvalues, and eigenvectors. We 
also demonstrate how a Kronecker graphs can match the behavior of several real networks (social 
networks, citations, web, internet, and others). While Kronecker products have been studied by 
the algebraic combinato rics community (see, e.g., ( Chow . 1997 : ImrichL 1998 : Imrich and Klavzar . 
200d : lHammackL 12009m . the present work is the first to employ this operation in the design of 



network models to match real data. 

Then we also make a step further and tackle the following problem: Given a large real network, 
we want to generate a synthetic graph, so that the resulting synthetic graph matches the properties 
of the real network as well as possible. 

Ideally we would like: (a) A graph generation model that naturally produces networks where 
many properties that are also found in real networks naturally emerge, (b) The model parameter 
estimation should be fast and scalable, so that we can handle networks with millions of nodes, (c) 
The resulting set of parameters should generate realistic-looking networks that match the statistical 
properties of the target, real networks. 

In general the problem of modeling network structure presents several conceptual and engineer- 
ing challenges: Which generative model should we choose, among the many in the literature? How 
do we measure the goodness of the fit? (Least squares don't work well for power laws, for subtle 
reasons!) If we use likelihood, how do we estimate it faster than in time quadratic on the number 
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of nodes? How do we solve the node correspondence problem, i.e., which node of the real network 
corresponds to what node of the synthetic one? 

To answer the above questions we present KronFit, a fast and scalable algorithm for fitting 
Kronecker graphs by using the maximum likelihood principle. When calculating the likelihood there 
are two challenges: First, one needs to solve the node correspondence problem by matching the 
nodes of the real and the synthetic network. Essentially, one has to consider all mappings of nodes 
of the network to the rows and columns of the graph adjacency matrix. This becomes intractable 
for graphs with more than tens of nodes. Even when given the "true" node correspondences, just 
evaluating the likelihood is still prohibitively expensive for large graphs that we consider, as one 
needs to evaluate the probability of each possible edge. We present solutions to both of these 
problems: We develop a Metropolis sampling algorithm for sampling node correspondences, and 
approximate the likelihood to obtain a linear time algorithm for Kronecker graph model parameter 
estimation that scales to large networks with millions of nodes and edges. KronFit gives orders 
of magnitude speed-ups against older methods (20 minutes on a commodity PC, versus 2 days on a 
50-machine cluster). 

Our extensive experiments on synthetic and real networks show that Kronecker graphs can 
efficiently model statistical properties of networks, like degree distribution and diameter, while 
using only four parameters. 

Once the model is fitted to the real network, there are several benefits and applications: 

(a) Network structure: the parameters give us insight into the global structure of the network 
itself. 

(b) Null-model: when working with network data we would often like to assess the significance 
or the extent to which a certain network property is expressed. We can use Kronecker graph 
as an accurate null-model. 

(c) Simulations: given an algorithm working on a graph we would like to evaluate how its per- 
formance depends on various properties of the network. Using our model one can generate 
graphs that exhibit various combinations of such properties, and then evaluate the algorithm. 

(d) Extrapolations: we can use the model to generate a larger graph, to help us understand how 
the network will look like in the future. 

(e) Sampling: conversely, we can also generate a smaller graph, which may be useful for run- 
ning simulation experiments {e.g., simulating routing algorithms in computer networks, or 
virus/worm propagation algorithms), when these algorithms may be too slow to run on large 
graphs. 

(f) Graph similarity: to compare the similarity of the structure of different networks (even of 
different sizes) one can use the differences in estimated parameters as a similarity measure. 

(g) Graph visualization and compression: we can compress the graph, by storing just the model 
parameters, and the deviations between the real and the synthetic graph. Similarly, for visual- 
ization purposes one can use the structure of the parameter matrix to visualize the backbone 
of the network, and then display the edges that deviate from the backbone structure. 
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(h) Anonymization: suppose that the real graph cannot be publicized, like, e.g., corporate e-mail 
network or customer-product sales in a recommendation system. Yet, we would like to share 
our network. Our work gives ways to such a realistic, 'similar' network. 



The current paper builds o n our previous work on Kronecker graphs (ILeskovec et all l2005al : 
Leskovec and Faloutsosl . 120070 and is organized as follows: Section |2] briefly surveys the related 
literature. In section [3] we introduce the Kronecker graph model, and give formal statements about 
the properties of networks it generates. We investigate the model using simulation in Section @] 
and continue by introducing KronFit, the Kronecker graphs parameter estimation algorithm, in 
Section [5] We present experimental results on a wide range of real and synthetic networks in 
Section[6] We close with discussion and conclusions in sections |7] and [8] 



2. Relation to previous work on network modeling 

Networks across a wide range of domains present surprising regularities, such as power laws, small 
diameters, communities, and so on. We use these patterns as sanity checks, that is, our synthetic 
graphs should match those properties of the real target graph. 

Most of the related work in this field has concentrated on two aspects: properties and pat- 
terns found in real-world networks, and then ways to find models to build understanding about the 
emergence of these properties. First, we will discuss the commonly found patterns in (static and 
temporally evolving) graphs, and finally, the state of the art in graph generation methods. 



2.1 Graph Patterns 



Here we briefly introduce the network patterns (also referred to as properties or statistics) that we 
will later use to compare the similarity between the real networks and their synthetic counterparts 
produced by the Kronecker graphs model. While many patterns have been discovered, two of the 
principal ones are heavy-tailed degree distributions and small diameters. 

Degree distribution: The degree-distribution of a graph is a power law if the number of nodes 
Nd with degree d is given by oc d~ J (7 > 0) where 7 is called th e power law exponent - 
Power laws have be en found in the I nternet (Faloutsos et al. . 1999h . the Web ( Kleinberg et al. . 19991: 
Broder et all 120001 1. citation graphs (|Rednerl . ll998r) . online social networks (|Chakrabarti et alll2004h 
and many others. 

Small diameter: Most real-world graph s exhibit relative ly small diameter (the "small- world" 
phenomenon, or "six degrees of separation" ( Milgram l . 1967b '): A graph has diameter D if every pair 
of nodes can be connected by a path of length at most D edges. The diameter D is susceptible to out- 
liers. Thus, a more robust measure of the pair wise distances between nodes in a graph is the integer 
effective diameter (ITauro et alll200ll) . which is the minimum number of links (steps/hops) in which 
some fraction (or quantile q, say q = 0.9) of all connected pairs of nodes can reach each other. Here 
we make use of effective diameter which we define as follows (ILeskovec et all 12005m . For each 
natural number h, let g(h) denote the fraction of connected node pairs whose shortest connecting 
path has length at most h, i.e., at most h hops away. We then consider a function defined over all 
positive real numbers x by linearly interpolating between the points (h,g(h)) and (h + l,g(h+l)) 
for each x, where h = [x\, and we define the effective diameter of the network to be the value x at 
which the function g{x) achieves the value 0.9. The effective diameter has been found to be small 
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for large real-world graphs, like Internet, W eb, and online social networks (lAlbert and Barabasi 



20021 : iMilgramL 1 19671 : iLeskovec et al .. 2005b). 



Hop-plot: It extends the notion of diameter by pl otting the number of reachable pairs g{h) 
within h hops, as a function of the number of hops h (IPalmer et all I2002T) . It gives us a sense of 
how quickly nodes' neighborhoods expand with the number of hops. 

Scree plot: This is a plot of the eigenvalues (or singular values) of the graph adjacency matrix, 
versus their rank, u sing the logarithmic scale. The scree plot is also often found to approximately 
obey a power law dChakrabarti et all 120041 : Farkas et all 120011). Moreover, t his pa t tern was also 



found analytically for random power law graphs (IMihail and Papadimitrioul . 120021 : IChung et al. 



2003). 



Network values: The distribution of eigenvector components (indicators of "network value") 
associated to the largest e igenvalue of the graph adjacency matrix has also been found to be skewed 
dChakrabarti et al.Ll2004h . 

Node trian gle participation: Edges in real- world networks and especially in social networks 
tend to cluster (IWatts and Strogatzl.ll998l) and form triads of connected nodes. Node triangle partic- 
ipation is a measure of transitivity in networks. It counts the number of triangles a node participates 
in, i.e., the number of connections between the neighbors of a node. The plot of the number of 
triangles A versus the num ber of nodes that participate in A triangles has also been found to be 
skewed (ITsourakakisl . 120080 . 

Densification power law: The relation between the number of edges E(t) and the number of 
nodes N(t) in evolving network at time t obeys the densification power law (DPL), which states 
that E(t) oc N(t) a . The densification exponent a is typically greater than 1, implying that the 
average degree of a node in the network is increasing over time (as the network gains more nodes 
and edges). This me ans that real networks tend to sprout many more edges than nodes, and thus 
densify as they grow ( Leskovec et al. . 2005bl. 2007 ah . 

Shrinking diameter: The effective di ameter of graphs tends to shrink or stabilize as the number 
of nodes in a network grows over time (ILeskovec et all l2005bl. 12007 al) . This is somewhat coun- 
terintuitive since from common experience as one would expect that as the volume of the object (a 
graph) grows, the size (i.e., the diameter) would also grow. But for real networks this does not hold 
as the diameter shrinks and then seems to stabilize as the network grows. 



2.2 Generative models of network structure 



The earliest probabilistic generative model for graphs was the Erdos-Renyi (lErdos and Renyilll960l) 
random graph model, where each pair of nodes has an identical, independent probability of being 
joined by an edge. The study of this model has led to a rich mathematical theory. However, as the 
model was not developed to model real-world networks it produces graphs that fail to match real 
networks in a number of respects (for example, it does not produce heavy-tailed degree distribu- 
tions). 

The vast majority of r e cent network models invq l ye some form of preferential attachment 



1999|; 



Kumar et al. 



(Barabasi and Albert. 1 19991: 1 Alb ert and Barabasi, 120021 : IWinick and Jaminl. 120021 : Kleinberg et al. 



19991 : iFlaxman et all I2007D that employs a simple rule: new node joins the 



graph at each time step, and then creates a connection to an existing node u with the probability 
proportional to the degree of the node u. This leads to the "rich get richer" phenomena and to power 
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law tails in degree distribution. However, the diameter in this model grows slowly with the number 
of nodes N, which violates the "shrinking diameter" property mentioned above. 

There are also many variations of preferential attachme nt model, all someh ow employing the 
"rich get richer" type mechanism, e.g., the "copying model" (IKumar et all 120001). the " winner does 
not take all" model (IPennock et all 120020 . the "forest fire" model (ILeskovec et all |2005bJ), the 
"random surfer model" (IBlum et all 120061) . etc. 

A different family of network methods strives for small diame ter and local cluste r ing in net- 
works. Examples of s uch models inclu de the small-world model ( Watts and Strogatz . 19981) and 
the Waxman generator ( Waxman . 1988b . Another family of models sho ws that heavy tails emerge 
if nodes try to optim ize their connectivity under resource constraints ( Carlson and Doyle . 19991 : 
Fabrikantetalll2002l) . 

In summary, most current models focus on modeling only one (static) network property, and 
neglect the others. In addition, it is usually hard to analytically analyze properties of the network 
model. On the other hand, the Kronecker graph model we describe in the next section addresses 
these issues as it matches multiple properties of real networks at the same time, while being analyt- 
ically tractable and lending itself to rigorous analysis. 



2.3 Parameter estimation of network models 

Until recently relatively little effort was made to fit the above network models to real data. One of 
the difficulties is that most of the above models usually define a mechanism or a principle by which 
a network is constructed, and thus parameter estimation is either trivial or almost impossible. 

Most work in estimating network models comes from the area of social sciences, statistics and 
social net work analysis where the expone ntial random graphs, also known as p* model, were in- 
troduced ( Wasserman and Pattison . fl996l ). The model essentially defines a log linear model over 
all possible graphs G, p(G\8) oc exp(9 T s(G)), where G is a graph, and s is a set of functions, 
that can be viewed as summary statistics for the structural features of the network. The p* model 
usually focuses on "local" structural features of networks (like, e.g., characteristics of nodes that 
determine a presence of an edge, link reciprocity, etc.). As exponential random graphs have been 
very useful for modeling small networks, and individual nodes and edges, our goal here is different 
in a sense that we aim to accurately model the structure of the network as a whole. Moreover, we 
aim to model and estimate parameters of networks with millions of nodes, while even for graphs 
of small size (> 100 nodes) the number of model parameters in exponential random graphs usually 
becomes too large, and estimation prohibitively expensive, both in terms of computational time and 
memory. 

Regardless of a particular choice of a network model, a common theme when estimating the 
likelihood P{G) of a graph G under some model is the challenge of finding the correspondence be- 
tween the nodes of the true network and its synthetic counterpart. The node correspondence problem 
results in the factorially many possible matchings of nodes. One can think of the correspondence 
problem as a test of graph isomorphism. Two isomorphic graphs G and G' with differently assigned 
node IDs should have same likelihood P(G) = P{G') so we aim to find an accurate mapping 
between the nodes of the two graphs. 

An ordering or a permutation defines the mapping of nodes in one network to nodes in the 
other network. For example, Butts ( ButtsL 2005 ) used permutat ion sampling to determ ine similarity 
between two graph adjacency matrices, while Bezakova et al. (IBezakova et all 120060 used permu- 
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Table 1 : Table of symbols. 



tations for graph model selection. Recentl y, an approach for estimating parameters of the "copying" 
model was introduced ( Wiuf et al. , 200ut) . however authors also note that the class of "copying" 



models may not be rich enough to accurately model real networks. As we show later, Kronecker 
graph model seems to have the necessary expressive power to mimic real networks well. 

3. Kronecker graph model 

The Kronecker graph model we propose here is based on a recursive construction. Defining the 
recursion properly is somewhat subtle, as a number of standard, related graph construction methods 
fail to produce graphs that densify according to the patterns observed in real networks, and they also 
produce graphs whose diameters increase. To produce densifying graphs with constant/shrinking 
diameter, and thereby match the qualitative behavior of a real network, we develop a procedure that 
is best described in terms of the Kronecker product of matrices. 

3.1 Main idea 

The main intuition behind the model is to create self-similar graphs, recursively. We begin with an 
initiator graph K\ , with Ni nodes and E\ edges, and by recursion we produce successively larger 
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graphs K2, K%, . . . such that the k th graph Kf. is o n Nk = iVf nodes. If we want these graphs 



to exhibit a version of the Densification power law (ILeskovec et all |2005b), then Kk should have 
Ek = E\ edges. This is a property that requires some care in order to get right, as sta ndard recursive 



const ructions (for example, the traditional Cartesian product or the construction of (IBarabasi et al 
20011)) do not yield graphs satisfying the densification power law. 



It turns out that the Kronecker product of two matrices is the right tool for this goal. 
Kronecker product is defined as follows: 



The 



Definition 1 (Kronecker product of matrices) Given two matrices A = [a^j] and B of sizes n x 
m and n' x m' respectively, the Kronecker product matrix C of dimensions (n ■ n') x (m ■ m!) is 
given by 

( ai^B oi j2 B . . . oi jTO B \ 
a,2, iB a.2,2B . . . a2, m B 



A (g) B = 



(1) 



\ On.iB a„ 5 2B . . . a n>m B J 



We then define the Kronecker product of two graphs simply as the Kronecker product of their 
corresponding adjacency matrices. 



Definition 2 (Kronecker product of graphs (Weichsel, 1962)) If G and H are graphs with adja- 
cency matrices A(G) and A(H) respectively, then the Kronecker product G ® H is defined as the 
graph with adjacency matrix A(G) ® A(H). 



Observation 1 (Edges in Kronecker-multiplied graphs) 

Edge (X&Xu) £G®Hijf (Xi,X k ) G G and {Xj,X{) G H 

where Xij and Xki are nodes in G ® H, and X& Xj, Xk and X\ are the corresponding nodes in G 
and H, as in Figure\J} 

The last observation is crucial, and deserves elaboration. Basically, each node in G <8> H can be 
represented as an ordered pair Xij, with i a node of G and j a node of H, and with an edge joining 
X^ and X^ precisely when (Xj, Xk) is an edge of G and (Xj, Xi) is an edge of H. This is a direct 
consequence of the hierarchical nature of the Kronecker product. Figure QJa-c) further illustrates 
this by showing the recursive construction of G (8) H, when G = H is a 3-node chain. Consider 
node Xi t 2 in FigureQJc): It belongs to the H graph that replaced node X\ (see FigureQJb)), and in 
fact is the X2 node (i.e., the center) within this small iT-graph. 

We propose to produce a growing sequence of matrices by iterating the Kronecker product: 

Definition 3 (Kronecker power) The k th power of K\ is defined as the matrix K[ k ^ ( abbreviated 
to Kk), such that: 

Kf ] =K k = K\ g Ki ®...Ki = K k ^ ® K x 

s v y 

k times 
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(a) Graph K x 
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(d) Adjacency matrix 
of K\ 





Central node is X 

(b) Intermediate stage (c) Graph K 2 



2,2 










Kl 


K l 


K i 





K l 


K i 



(e) Adjacency matrix 

of K 2 = K l ® K x 



Figure 1: Example of Kronecker multiplication: Top: a "3-chain" initiator graph and its Kronecker 
product with itself. Each of the Xi nodes gets expanded into 3 nodes, which are then 
linked using Observation [TJ Bottom row: the corresponding adjacency matrices. See 
figure|2]for adjacency matrices of K$ and K4. 



Definition 4 (Kronecker graph) Kronecker graph of order k is defined by the adjacency matrix 

\k] 

K[ , where K\ is the Kronecker initiator adjacency matrix. 



The self-similar nature of the Kronecker graph product is clear: To produce from Kf.-i, 
we "expand" (replace) each node of K^-i by converting it into a copy of K±, and we join these 
copies together according to the adjacencies in Kk-i (see Figure [[J. This process is very natural: 
one can imagine it as positing that communities within the graph grow recursively, with nodes in 
the community recursively getting expanded into miniature copies of the community. Nodes in the 
sub-community then link among themselves and also to nodes from other communities. 

Note that there are many different names to refer to Kronecker product of graphs. Other names 
for the Kronecker product are tensor product, categorical product, direct product, cardinal prod- 
uct, relational product, conjun ction, weak direct product or just product, and even Cartesian prod- 
uct (|lmrich and Klavzan. 



3.2 Analysis of Kronecker graphs 

We shall now discuss the properties of Kronecker graphs, specifically, their degree distributions, 
diameters, eigenvalues, eigenvectors, and time-evolution. Our ability to prove analytical results 
about all of these properties is a major advantage of Kronecker graphs over other network models. 



9 



Leskovec, Chakrabarti, Kleinberg, Faloutsos, and Ghahramani 




(a) K% adjacency matrix (27 x 27) 



(b) K4 adjacency matrix (81 x 81) 



Figure 2: Adjacency matrices of K3 and K±, the 3 and 4 Kronecker power of K\ matrix as 
denned in Figure [TJ Dots represent non-zero matrix entries, and white space represents 
zeros. Notice the recursive self-similar structure of the adjacency matrix. 



3.2.1 Degree distribution 



The next few theorems prove that several distributions of interest are multinomial for our Kronecker 
graph model. This is important, because a careful choice of the initial graph K\ makes the result- 
ing multino mial distribution to behave like a po wer law or Discrete Gaussian Exponential (DGX) 



distribution dBi et all. 1200 lUClauset et all 120071) . 



Theorem 5 (Multinomial degree distribution) Kronecker graphs have multinomial degree distri- 
butions, for both in- and out-degrees. 



Proof Let the initiator K\ have the degree sequence dx, cfe, ■ ■ ■ , djVi- Kronecker multiplication of 
a node with degree d expands it into N\ nodes, with the corresponding degrees being d x d± , d x 
d%, ■ ■ ■ ,d x djvi- After Kronecker powering, the degree of each node in graph is of the form 
di x x di 2 x . . . di k , with ii, «2> • • • > *fc £ (1 • • • an d there is one node for each ordered combi- 
nation. This gives us the multinomial distribution on the degrees of K^. So, graph will have 
multinomial degree distribution where the "events" (degrees) of the distribution will be combina- 
tions of degree products: d^d 1 ^ . . . c^ 1 (where Ylj=i h = an d event (degree) probabilities will 
be proportional to ( . . k ■ ) . Note also that this is equivalent to noticing that the degrees of nodes 
in Kk can be expressed as the k th Kronecker power of the vector (d\, cfe; ■ ■ ■ , d^). ■ 
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Figure 3: Two examples of Kronecker initiators on 4 nodes and the self-similar adjacency matrices 
they produce. 



3.2.2 Spectral properties 

Next we analyze the spectral properties of adjacency matrix of a Kronecker graph. We show that 
both the distribution of eigenvalues and the distribution of component values of eigenvectors of the 
graph adjacency matrix follow multinomial distributions. 

Theorem 6 (Multinomial eigenvalue distribution) The Kronecker graph has a multinomial 
distribution for its eigenvalues. 



Pro of Let have the eigenvalues A1 , A2, • • ■ , Ajvi- By properties of the Kronecker multiplica- 
tion (ILoan , l2000l : lLangville and Steward. I2004I) . the eigenvalues of are the k th Kronecker power 
of the vector of eigenvalues of the initiator matrix, (Ai, A2, . . . , AjyJ^- As in Theorem[5J the eigen- 
value distribution is a multinomial. ■ 



A similar argument using properties of Kronecker matrix multiplication shows the following. 

Theorem 7 (Multinomial eigenvector distribution) The components of each eigenvector of the 
Kronecker graph follow a multinomial distribution. 



Proof Let K-\ hav e the eigenvectors v-\ , . . . . vn, ■ By properties of the Kronecker multipli- 
cation ( Loan . 2000I : Langville and Stewart . 2004 ). the eigenvectors of are given by the k th 
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Kronecker power of the vector: «2j • • • > ^7Vi)> which gives a multinomial distribution for the 
components of each eigenvector in K^. ■ 

We have just covered several of the static graph patterns. Notice that the proofs were a direct 
consequences of the Kronecker multiplication properties. 

3.2.3 Connectivity of Kronecker graphs 

We now present a series of results on the connectivity of Kronecker graphs. We show, maybe a bit 
surprisingly, that even if a Kronecker initiator graph is connected its Kronecker power can in fact 
be disconnected. 

Lemma 8 If at least one of G and H is a disconnected graph, then G (g> H is also disconnected. 

Proof Without loss of generality we can assume that G has two connected components, while H 
is connected. Figure [Ua) illustrates the corresponding adjacency matrix for G. Using the nota- 
tion from Observation Q] let graph let G have nodes X\,... ,X n , where nodes {X\, . . . X r } and 
{X r+ i, . . . ,X n } form the two connected components. Now, note that {Xij,X^i) £ G <g> H for 
i G {1, . . . , r}, k G {r + 1, . . . , n}, and all j, I. This follows directly from Observation Q] as 
(Xi, Xk) are not edges in G. Thus, G $S> H must at least two connected components. ■ 

Actually it turns out that both G and H can be connected while G ® H is disconnected. The 
following theorem analyzes this case. 

Theorem 9 If both G and H are connected but bipartite, then G ® H is disconnected, and each of 
the two connected components is again bipartite. 

Proof Again without loss of generality let G be bipartite with two partitions A = {X\ , . . . X r } and 
B = {X r+ i, . . . , X n }, where edges exists only between the partitions, and no edges exist inside 
the partition: (Xi,Xk) £ G for i,k G A or i,k G B. Similarly, let H also be bipartite with 
two partitions C = {X\, . . . X s } and D = {X s+ \, . . . , X m }. Figures |4jb) and (c) illustrate the 
structure of the corresponding adjacency matrices. 

Now, there will be two connected components in G ® H: I st component will be composed of 
nodes {Xij} G G (g> H , where (i G A,j G D) or (i G 8,j 6 C). And similarly, 2 nd component 
will be composed of nodes {X^}, where (i G A, j G C) or (i G B,j G D). Basically, there 
exist edges between node sets (A,D) and (B,C), and similarly between (A,C) and (B,D) but 
not across the sets. To see this we have to analyze the cases using Observation [Q For example, 
in G <g> H there exist edges between nodes (A,C) and (B,D) as there exist edges (i,k) G G for 
i G A,k G B, and (j, I) G H for j G C and I G D. Similar is true for nodes (A, C) and (B, D). 
However, there are no edges cross the two sets, e.g., nodes from (A, D) do not link to (A, C), as 
there are no edges between nodes in A (since G is bipartite). See Figures H^d) andHJe) for a visual 
proof. ■ 

Note that bipartite graphs are triangle free and have no self -loops. Stars, chains, trees and cycles 
of even length are all examples of bipartite graphs. In order to ensure that is connected, for the 
remained of the paper we focus on initiator graphs K\ with self loops on all of the vertices. 
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(d) Kronecker product of 
two bipartite graphs G and H 



(e) Rearranged adjacency 
matrix from panel (d) 



Figure 4: Graph adjacency matrices. Dark parts represent connected (filled with ones) and white 
parts represent empty (filled with zeros) parts of the adjacency matrix, (a) When G is 
disconnected, Kronecker multiplication with any matrix H will result in G <S> H being 
disconnected, (b) Adjacency matrix of a connected bipartite graph G with node partitions 
A and B. (c) Adjacency matrix of a connected bipartite graph G with node partitions C 
and D. (e) Kronecker product of two bipartite graphs G and H. (d) After rearranging the 
adjacency matrix G <g> H we clearly see the resulting graph is disconnected. 



3.2.4 Temporal properties of Kronecker graphs 



We continue with the analysis of temporal patterns of evolution of Kronecker grap hs: the densifica- 
tion power law, and shrinking/stabilizing diameter ( Leskovec et al. , 2005bl . 2007a ). 



Theorem 10 (Densification power law) Kronecker graphs follow the densification power law (DPL) 
with densification exponent a = \og(E\)/ \og{N{). 

Proof Since the k th Kronecker power Kj. has Nk = nodes and Ej, = E\ edges, it satisfies 
Ek = N%, where a = log(£i)/ log(A?i). The crucial point is that this exponent a is independent of 
k, and hence the sequence of Kronecker powers follows an exact version of the densification power 
law. ■ 



We now show how the Kronecker product also preserves the property of constant diameter, a 
crucial ingredient for matching the diameter properties of many real-world network datasets. In 
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order to establish this, we will assume that the initiator graph K\ has a self-loop on every node. 
Otherwise, its Kronecker powers may be disconnected. 

Lemma 11 IfG and H each have diameter at most D and each has a self -loop on every node, then 
the Kronecker graph G <2> H also has diameter at most D. 

Proof Each node in G ® H can be represented as an ordered pair (v, w), with v a node of G and w 
a node of H, and with an edge joining (v, w) and (x, y) precisely when (v, x) is an edge of G and 
(w, y) is an edge of H. (Note this exactly the Observation Q]) Now, for an arbitrary pair of nodes 
(v, w) and (V, w'), we must show that there is a path of length at most D connecting them. Since 
G has diameter at most D, there is a path v = v±, V2, ■ ■ ■ ,v r = v', where r < D. If r < D, we can 
convert this into a path v = v\, V2, ■ ■ ■ , vd = v' of length exactly D, by simply repeating v' at the 
end for D — r times. By an analogous argument, we have a path w = u?i, W2, ■ ■ ■ , Wd = w'. Now 
by the definition of the Kronecker product, there is an edge joining (vi,Wi) and (uj+i, lOj+i) for all 
1 < i < D — 1, and so (v, w) = (yi,w\), (V2, W2), ■ ■ ■ , (vd,wd) = (v', w') is a path of length D 
connecting (v, w) to (V, w'), as required. ■ 



Theorem 12 If K\ has diameter D and a self-loop on every node, then for every k, the graph 
also has diameter D. 

Proof This follows directly from the previous lemma, combined with induction on k. ■ 

As defined in section [2] we also consider the effective diameter D*. We defined the (/-effective 
diameter as the minimum D* such that, for a q fraction of the reachable node pairs, the path length 
is at most D*. The g-effective diameter is a more robust quantity than the diameter, the latter being 
prone to the effects of degenerate structures in the graph (e.g., very long chains). However, the q- 
effective diameter and diameter tend to exhibit qualitatively similar behavior. For reporting results 
in subsequent sections, we will generally consider the q-effective diameter with q = 0.9, and refer 
to this simply as the effective diameter. 

Theorem 13 (Effective Diameter) If K\ has diameter D and a self-loop on every node, then for 
every q, the q-effective diameter of converges to D (from below) as k increases. 

Proof To prove this, it is sufficient to show that for two randomly selected nodes of Kk, the proba- 
bility that their distance is D converges to 1 as k goes to infinity. 

We establish this as follows. Each node in can be represented as an ordered sequence of k 
nodes from K\, and we can view the random selection of a node in as a sequence of k inde- 
pendent random node selections from K\. Suppose that v = (yi, . . . , Vk) and w = (w\, . . . , Wf.) 
are two such randomly selected nodes from K^. Now, if x and y are two nodes in Ki at distance 
D (such a pair (x, y) exists since K\ has diameter D), then with probability 1 — (1 — jj?) k , there 
is some index j for which {vj, Wj} = {x, y}. If there is such an index, then the distance between 
v and w is D. As the expression 1 — (1 — -^i) k converges to 1 as k increases, it follows that the 
q-effective diameter is converging to D. ■ 
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Figure 5: The "staircase" effect. Kronecker initiator and the degree distribution and network value 
plot for the 6 th Kronecker power of the initiator. Notice the non-smoothness of the curves. 



3.3 Stochastic Kronecker graphs 

While the Kronecker power construction discussed so far yields graphs with a range of desired prop- 
erties, its discrete nature produces "staircase effects" in the degrees and spectral quantities, simply 
because individual values have large multiplicities. For example, degree distribution and distri- 
bution of eigenvalues of graph adjacency matrix and the distribution of the principal eigenvector 
components (i.e., the "network" value) are all impacted by this. These quantities are multinomi- 
ally distributed which leads to individual values with large multiplicities. Figure [5] illustrates the 
staircase effect. 

Here we propose a stochastic version of Kronecker graphs that eliminates this effect. There 
are many possible ways how one could introduce stochasticity into Kronecker graph model. Be- 
fore introducing the proposed model, we introduce two simple ways of introducing randomness to 
Kronecker graphs and describe why they do not work. 

Probably the simplest (but wrong) idea is to generate a large deterministic Kronecker graph 
Kjc, and then uniformly at random flip some edges, i.e., uniformly at random select entries of 
the graph adjacency matrix and flip them (1 — ► 0,0 — > 1). However, this will not work, as it 
will essentially superimpose an Erdos-Renyi random graph, which would, for example, corrupt the 
degree distribution - real networks usually have heavy tailed degree distributions, while random 
graphs have Binomial degree distributions. A second idea could be to allow a weighted initiator 
matrix, i.e., values of entries of K\ are not restricted to values {0, 1} but rather can be any non- 
negative real number. Using such K\ one would generate and then threshold the matrix to 
obtain a binary adjacency matrix K, i.e., for a chosen value of e set K[i, j] = 1 if Kk[i,j] > e else 
K[i, j] = 0. This also would not work as the mechanism would selectively remove edges and thus 
the low degree nodes which would have low weight edges would get isolated first. 

Now we define Stochastic Kronecker graph model which overcomes the above issues. A more 
natural way to introduce stochasticity to Kronecker graphs is to relax the assumption that entries of 
the initiator matrix take only binary values. Instead we allow entries of the initiator to take values 
on the interval [0, 1]. This means now each entry of the initiator matrix encodes the probability 
of that particular edge appearing. We then Kronecker-power such initiator matrix to obtain a large 
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stochastic adjacency matrix, where again each entry of the large matrix gives the probability of that 
particular edge appearing in a big graph. Such a stochastic adjacency matrix defines a probability 
distribution over all graphs. To obtain a graph we simply sample an instance from this distribution 
by sampling individual edges, where each edge appears independently with probability given by the 
entry of the large stochastic adjacency matrix. More formally, we define: 

Definition 14 (Stochastic Kronecker graph) Let V\ be a N\ x N\ probability matrix: the value 
6ij € V\ denotes the probability that edge is present, 0« £ [0, 1]. 

Then k th Kronecker power Vy = Vk, where each entry p uv £ Vk encodes the probability of 
an edge (u, v). 

To obtain a graph, an instance for realization), K = RiVk) we include edge (u,v) in K with 
probability p uv , p uv € V k - 

First, note that sum of the entries of V\, can ^ e g rea ter than 1, Second, notice that in 

principle it takes 0(N l 2k ) time to generate an instance K of a Stochastic Kronecker graph from the 
probability matrix Vk- This means the time to get a realization K is quadratic in the size of Vk as 
one has to flip a coin for each possible edge in the graph. Later we show how to generate Stochastic 
Kronecker graphs much faster, in the time linear in the expected number of edges in Vk- 



3.3.1 Probability of an edge 



For the size of graphs we aim to model and generate here taking V\ (or K\) and then explicitly 
performing the Kronecker product of the initiator matrix is infeasible. The reason for this is that V\ 
is usually dense, so Vk is also dense and one can not explicitly store it in memory to directly iterate 
the Kronecker product. However, due to the structure of Kronecker multiplication one can easily 
compute the probability of an edge in Vk- 

The probability p uv of an edge (u, v) occurring in A;-th Kronecker power V = Vk can be 
calculated in 0(k) time as follows: 

k-l 



Pl 



i 



(modiVi) + 1, 



1 



N- 



(modiVi) + 1 



l J 



(2) 



The equation imitates recursive descent into the matrix V, where at every level i the appropriate 
entry of V\ is chosen. Since V has N\ rows and columns it takes 0(klog Ni) to evaluate the 
equation. Refer to Figure [6] for the illustration of the recursive structure of V. 



3.4 Additional properties of Kronecker graphs 

Stoc hastic Kronecker graph s with initiator matrix of size N\ = 2 were studied by Mahdian and 
Xu (|Mahdian and XuL l2007r) . The authors showed a phase transition for the emergence of the gi- 
ant component and another phase transition for connectivity, and proved that such graphs have 
constant di ameters beyond the connectivity threshold, but are not searchable using a decentralized 
algorithm (IKleinbergl . 1 1 9991) . 

General overview of Kronecker product is given in (|lmrich and Klavzar , 2000) and properties 
of Kronecker graphs relat e d to g raph minors, planarity, cut vertex and cut edge have been explored 
in (IBottreau and Metivierl. 1 19981) . Moreover, recently (ITsourakakisl. 120081) gave a closed form ex- 
pression for the number of triangles in a Kronecker graph that depends on the eigenvalues of the 
initiator graph K\ . 
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Figure 6: Stochastic Kronecker initiator V\ and the corresponding 2 Kronecker power T^- Notice 
the recursive nature of the Kronecker product, with edge probabilities in Vi simply being 
products of entries of V\ . 



3.5 Two interpretations of Kronecker graphs 



Next, we present two natural interpretations of the generative process behind the Kronecker graphs 
that go beyond the purely mathematical construction of Kronecker graphs as introduced so far. 

We already mentioned the first interpretation when we first defined Kronecker graphs. One 
intuition is that networks are hierarchically organized into communities (clusters). Communi- 
ties then grow recursively, creating miniature copies of themselves. Figure Q] depicts the process 
of the recursive community expa nsion. In fact, several researchers have argue d that real net- 
works are hierarchically organized ( Ravasz et al. , 2002 : Ravasz and Barab£sll2003h and algorithms 



to extract the netw ork hierarchical structure have also been developed (ISales-Par do et al 



Clauset et all. 120081). Moreov er, especially web graphs ( Dill et all 120021: iDorogovtsev et al. 



2007; 



2002; 



Crovella and Bestavrosl . 119971) and biological networks ( Ravasz and Barabasi 120031 ) were found to 
be self-similar and "fractal". 

The second intuition comes from viewing every node of Vk as being described with an ordered 
sequence of k nodes from V\. (This is similar to the Observation Q] and the proof of Theorem [131) 

Let's label nodes of the initiator matrix V\, u±, . . . ,ujsr 1 , and nodes of Vk as v\,. . . ,v N k. 
Then every node Uj of Vk is described with a sequence (vi(l), . . . ,Vi(k)) of node labels of V\, 
where i>j(Z) € {u±, Similarly, consider also a second node vj with the label sequence 

(vj(l), . . . , Vj(k)). Then the probability p e of an edge (v^, Vj) in Vk is exactly: 



Pe 



Vk[v i ,v j ]=l[V 1 [v l (l),v j (l)} 



1=1 



(Note this is exactly the Equation [2]) 

Now one can look at the description sequence of node Vi as a k dimensional vector of attribute 
values (vi(l),... ,Vi(k)). Then p e (vi,Vj) is exactly the coordinate-wise product of appropriate 
entries of V\, where the node description sequence selects which entries of V\ to multiply. Thus, 
the V\ matrix can be thought of as the attribute similarity matrix, i.e., it encodes the probability of 
linking given that two nodes agree/disagree on the attribute value. Then the probability of an edge 
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is simply a product of individual attribute similarities over the k iVi-valued attributes that describe 
each of the two nodes. 

This gives us a very natural interpretation of Stochastic Kronecker graphs: Each node is de- 
scribed by a sequence of categorical attribute values or features. And then the probability of two 
nodes linking depends on the product of individual attribute similarities. This way Kronecker graphs 
can effectively model homophily (nodes with similar attribute values are more likely to link) by V\ 
having high value entries on the diagonal. Or heterophily (nodes that differ are more likely to link) 
by V\ having high entries off the diagonal. 

Figure [6] shows an example. Let's label nodes of V\ U\,U2 as in Figure Oa). Then every 
node of Vk is described with an ordered sequence of k binary attributes. For example, Figure Ob) 
shows an instance for k = 2 where node vi of V2 is described by (u\,U2), and similarly ^3 by 
(«2, u±). Then as shown in Figure^b), the probability of edge p e (v2, V3) = b ■ c, which is exactly 
V\ [u2, ui] ■Vi[ui,U2] = b ■ c — the product of entries of V\, where the corresponding elements of 
the description of nodes V2 and ^3 act as selectors of which entries of V\ to multiply. 

Figure Oc) further illustrates the recursive nature of Kronecker graphs. One can see Kronecker 
product as recursive descent into the big adjacency matrix where at each stage one of the entries 
or blocks is chosen. For example, to get to entry (^2,^3) one first needs to dive into quadrant b 
following by the quadrant c. This intuition will help us in section 1331 to devise a fast algorithm for 
generating Kronecker graphs. 

However, there are also two notes to make here. First, using a single initiator V\ we are implic- 
itly assuming that there is one single and universal attribute similarity matrix that holds across all 
k iVi-ary attributes. One can easily relax this assumption by taking a different initiator matrix for 
each attribute (initiator matrices can even be of different sizes as attributes are of different arity), 
and then Kronecker multiplying them to obtain a large network. Here each initiator matrix plays the 
role of attribute similarity matrix for that particular attribute. 

For simplicity and convenience we will work with a single initiator matrix but all our methods 
can be trivially extended to handle multiple initiator matrices. Moreover, as we will see later in 
section [6] even a single 2x2 initiator matrix seems to be enough to capture large scale statistical 
properties of real-world networks. 

The second assumption is harder to relax. When describing every node Vi with a sequence of 
attribute values we are implicitly assuming that the values of all attributes are uniformly distributed 
(have same proportions), and that every node has a unique combination of attribute values. So, all 
possible combinations of attribute values are taken. For example, node v\ in a large matrix Vk has 
attribute sequence (u\,U\, . . . ,u\), and has (itx, 14., . . . , u\, ujvi), while the "last" node v N k is 
has attribute values (it/Vi , UNu ■ ■ ■ > UNi)- One can think of this as counting in iVi-ary number sys- 
tem, where node attribute descriptions range from (i.e., "leftmost" node with attribute description 
. . . ,u\)) to (i.e., "rightmost" node attribute description (un^uni, ■ ■ ■ , u Ni))- 

A simple way to relax the above assumption is to take a larger initiator matrix with a smaller 
number of parameters than the number of entries. This means that multiple entries of V\ will share 
the same value (parameter). For example, if attribute u\ takes one value 66% of the times, and the 
other value 33% of the times, then one can model this by taking a 3 x 3 initiator matrix with only 
four parameters. Adopting the naming convention of Figure [6] this means that parameter a now 
occupies a 2 x 2 block, which then also makes b and c occupy 2x1 and 1x2 blocks, and d a single 
cell. This way one gets a four parameter model with uneven feature value distribution. 
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We note that the view of Kronecker graphs where every node is described with a set of features 
and the initiator matrix encodes the probability of linking given the attribute values of two nodes 



some what resembles the Random dot product graph model (|Young and Scheinermanl . 120071 : iNickell. 



20081) . The important difference here is that we multiply individual linking probabilities, while in 
Random dot product graphs one takes the sum of individual probabilities which seems somewhat 
less natural. 

3.6 Fast generation of Stochastic Kronecker graphs 

The intuition for fast generation of Stochastic Kronecker graphs comes from t he recursive nature 



of the Kronecker product and is closely related to the R-MAT graph generator (|Chakrabarti et al 



20041) . Generating a Stochastic Kronecker graph K on N nodes naively takes 0{N 2 ) time. Here 
we present a linear time 0(E) algorithm, where E is the (expected) number of edges in K. 

Figure[6]c) shows the recursive nature of the Kronecker product. To "arrive" to a particular edge 
(vi,Vj) of Vk one has to make a sequence of k (in our case k = 2) decisions among the entries of V\, 
multiply the chosen entries of V\, and then placing the edge (vi,vj) with the obtained probability. 

Instead of flipping 0(N 2 ) = 0(N 2k ) biased coins to determine the edges, we can place E edges 
by directly simulating the recursion of the Kronecker product. Basically we recursively choose sub- 
regions of matrix K with probability proportional to 9ij, 6ij G V\ until in k steps we descend to a 
single cell of the big adjacency matrix K and place an edge. For example, for (v 2, V3) in Figure 0c) 
we first have to choose b following by c. 

The probability of each individual edge of Vk follows a Bernoulli dis tribution, as the edge 



occurrences are independent. By the Central Limit Theorem jPetrovL 1 19951) the number of edges 
in Vk tends to a normal distribution with mean (Yli j=i@ij) k = where 0y € V\. So, given 
a stochastic initiator matrix V\ we first sample the expected number of edges E in Vk- Then we 
place E edges in a graph K, by applying the recursive descent for k steps where at each step we 
choose entry (i,j) with probability 6ij/E\ where E\ = Since we add E = E\ ed ges, 

the probability that edge (vi,Vj) appears in K is exactly Vk[vi,Vj]. This basically means that in 
Stochastic Kronecker graphs the initiator matrix encodes both the total number of edges in a graph 
and their structure. Yl &ij encodes the number of edges in the graph, while the proportions (ratios) 
of values 9ij define how many edges each part of graph adjacency matrix will contain. 

In practice it can happen that more than one edge lands in the same (vi,vj) entry of big adja- 
cency matrix K. If an edge lands in a already occupied cell we insert it again. Even though values 
of V\ are usually skewed, adjacency matrices of real networks are so sparse that this is not really a 
problem in practice. Empirically we note that around 1% of edges collide. 

3.7 Observations and connections 

Next, we describe several observations about the properties of Kronecker graphs and make connec- 
tions to other network models. 



• Bipartite graphs: Kronecker graphs can naturally model bipartite graphs. Instead of starting 
with a square N\ x iVi initiator matrix, one can choose arbitrary N\ x M\ initiator matrix, 
where rows define "left", and columns the "right" side of the bipartite graph. Kronecker 
multiplication will then generate bipartite graphs with partition sizes N\ and M\. 
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• Graph distributions: Vk defines a distribution over all graphs, as it encodes the probability 
of all possible N± k edges appearing in a graph by using an exponentially smaller number of 
parameters (just iVf). As we will later see, even a very small number of parameters, e.g., 4 
(2x2 initiator matrix) or 9 (3 x 3 initiator), is enough to accurately model the structure of 
large networks. 



Extension of Erdos-Renyi random graph model: Stochastic Kronecker graphs represent an 
extension of Erdos-Renyi (|Erdos and Renyil. 1 1960 ) random graphs. If one takes V\ = [Oij], 



where every 6^ = p then we obtain exactly the Erdos-Renyi model of random graphs G n>p , 
where every edge appears independently with probability p. 

Relation to the R-MAT model: The re cursive nature of Stocha stic Kronecker graphs makes 



them related to the R-MAT generator (|Chakrabarti et all 120041 ). The difference between the 



two models is that in R-MAT one needs to separately specify the number of edges, while 
in Stochastic Kronecker graphs initiator matrix V\ also encodes the number of edges in the 
graph. Section [331 built on this similarity to devise a fast algorithm for generating Stochastic 
Kronecker graphs. 

Densification: Similarly as with deterministic Kronecker graphs the number of nodes in 
a Stochastic Kronecker graph grows as N*, and the expected number of edges grows as 
(Ylij @ij) k - This means one would want to choose values 9^ of the initiator matrix V\ so that 
Ylij @ij > -Wi m order for the resulting network to densify. 



4. Simulations of Kronecker graphs 

Next we perform a set of simulation experiments to demonstrate the ability of Kronecker graphs to 
match the patterns of real-world networks. We will tackle the problem of estimating the Kronecker 
graph model from real data, i.e., finding the most likely initiator V\, in the next section. Instead here 
we present simulation experiments using Kronecker graphs to explore the parameter space, and to 
compare properties of Kronecker graphs to those found in large real networks. 



4.1 Comparison to real graphs 

We observe two kinds of graph patterns — "static" and "temporal." As mentioned earlier, com- 
mon static patterns include degree distribution, scree plot (eigenvalues of graph adjacency matrix 
vs. rank) and distribution of components of the principal eigenvector of graph adjacency matrix. 
Temporal patterns include the diameter over time, and the densification power law. For the diameter 
computation, we use the effective diameter as defined in Section [2] 

For the puipose of this section consider the following setting. Given a real graph G we want 
to find Kronecker initiator that produces qualitatively similar graph. In principle one could try 
choosing each of the Nf parameters for the matrix V\ separately. However, we reduce the number 
of parameters from N\ to just two: a and (3. Let K\ be the initiator matrix (binary, deterministic). 
Then we create the corresponding stochastic initiator matrix V\ by replacing each "1" and "0" of 
K\ with a and (3 respectively ((3 < a). The resulting probability matrices maintain — with some 
random noise — the self-similar structure of the Kronecker graphs in the previous section (which, 
for clarity, we call deterministic Kronecker graphs). We defer the discussion of how to automatically 
estimate V\ from data G to the next section. 
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The datasets we use here are: 

• ClT-HEP-TH: This is a citation graph for High-Energy Physics Theory research pap ers from 



pre-p rint archive ArXiv, with a total of N =29,555 papers and E =352,807 citations (iGehrke et al 



20031) . We follow its evolution from January 1993 to April 2003, with one data-point per 



month. 

As-RouteViews: We also analyze a static d ataset consisting of a single snapshot of con 



nectivity among Internet Autonomous Systems (|Route Views!. 1 1 9971) from January 2000, with 
N =6,414 and E =26,467. 

Results are shown in Figure |7] for the ClT-HEP-TH graph which evolves over time. We show 
the plots of two static and two temporal patterns. We see that the deterministic Kronecker model 
already to some degree captures the qualitative structure of the degree and eigenvalue distributions, 
as well as the temporal patterns represented by the densification power law and the stabilizing di- 
ameter. However, the deterministic nature of this model results in "staircase" behavior, as shown 
in scree plot for the deterministic Kronecker graph of Figure [7] (column (b), second row). We see 
that the Stochastic Kronecker graphs smooth out these distributions, further matching the qualita- 
tive structure of the real data, and they also match the shrinking-before-stabilization trend of the 
diameters of real graphs. 

Similarly, Figure[8]shows plots for the static patterns in the Autonomous systems (As-RouteViews) 
graph. Recall that we analyze a single, static network snapsh ot in this case. In additio n to the degree 



distribution and scree plot, we also show two typical plots (iChakrabarti et all 120041) : the distribu- 
tion of network values (principal eigenvector components, sorted, versus rank) and the hop-plot (the 
number of reachable pairs g(h) within h hops or less, as a function of the number of hops h). Notice 
that, again, the Stochastic Kronecker graph matches well the properties of the real graph. 

4.2 Parameter space of Kronecker graphs 

Last we present simulation experiments that investigate the parameter space of Stochastic Kronecker 
graphs. 

First, in Figure|9]we show the ability of Kronecker Graphs to generate networks with increasing, 
constant and decreasing/stabilizing effective diameter. We start with a 4-node chain initiator graph 
(shown in top row of Figure©, setting each "1" of K\ to a and each "0" to j3 = to obtain V\ that 
we then use to generate a growing sequence of graphs. We plot the effective diameter of each R(Vk) 
as we generate a sequence of growing graphs R(T>2), R{Vz), . . . ,R{Viq). R(Vio) has exactly 
1,048,576 nodes. Notice Stochastic Kronecker graphs is a very flexible model. When the generated 
graph is very sparse (low value of a) we obtain graphs with slowly increasing effective diameter 
(Figure Ufa)). For intermediate values of a we get graphs with constant diameter (Figure Hfb)) and 
that in our case also slowly densify with densification exponent a = 1.05. Last, we see an example 
of a graph with shrinking/stabilizing effective diameter. Here we set the a = 0.54 which results in 
a densification exponent of a = 1.2. Note that these observations are not contradicting T heorem ITTI 
Actually, these simulations here agree well with the analysis of ( Mahdian and Xu . 2007b . 



Next, we examine the parameter space of a Stochastic Kronecker graphs where we choose a star 
on 4 nodes as a initiator graph and parameterized with a and (3 as before. The initiator graph and 
the structure of the corresponding (deterministic) Kronecker graph adjacency matrix is shown in top 
row of Figure [3] 
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Figure 7: Citation network (ClT-HEP-TH): Patterns from the real graph (top row), the deterministic 
Kronecker graph with K\ being a star graph on 4 nodes (center + 3 satellites) (middle 
row), and the Stochastic Kronecker graph (a = 0.41, (3 = 0.11 - bottom row). Static 
patterns: (a) is the PDF of degrees in the graph (log-log scale), and (b) the distribution of 
eigenvalues (log-log scale). Temporal patterns: (c) gives the effective diameter over time 
(linear-linear scale), and (d) is the number of edges versus number of nodes over time 
(log-log scale). Notice that the Stochastic Kronecker graphs qualitatively matches all the 
patterns very well. 



Figure [Tola) shows the sharp transition in the fraction of the number of nodes that belong to the 
largest weakly connected component as we fix (3 = 0.15 and slowly increase a. Such phase tran- 
sitions on the size of the largest connected component also occur in Erdos-Renyi random graphs. 
Figure fTUl b) further explores this by plotting the fraction of nodes in the largest connected compo- 
nent (N c /N) over the full parameter space. Notice a shaip transition between disconnected (white 
area) and connected graphs (dark). 

Last, Figure [TOl c) shows the effective diameter over the parameter space (a, j3) for the 4-node 
star initiator graph. Notice that when parameter values are small, the effective diameter is small, 
since the graph is disconnected and not many pairs of nodes can be reached. The shape of the 
transition between low-high diameter closely follows the shape of the emergence of the connected 
component. Similarly, when parameter values are large, the graph is very dense, and the diameter is 
small. There is a narrow band in parameter space where we get graphs with interesting diameters. 
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(a) Degree (b) Scree plot (c) "Network value" (d) "Hop-plot" 

distribution distribution 



Figure 8: Autonomous systems (As-RouteViews): Real (top) versus Kronecker (bottom). 

Columns (a) and (b) show the degree distribution and the scree plot, as before. Columns 
(c) and (d) show two more static patterns (see text). Notice that, again, the Stochastic 
Kronecker graph matches well the properties of the real graph. 




(a) Increasing diameter 

a = 0.38, = 




4 6 
log(Nodes) 



(b) Constant diameter 

a = 0.43, /3 = 



log(Nodes) 

(c) Decreasing diameter 
a = 0.54, p = 



Figure 9: Effective diameter over time for a 4-node chain initiator graph. After each consecutive 
Kronecker power we measure the effective diameter. We use different settings of a pa- 
rameter, a = 0.38, 0.43, 0.54 and P = 0, respectively. 



5. Kronecker graph model estimation 

In previous sections we investigated various properties of networks generated by the (Stochastic) 
Kronecker graphs model. Many of these properties were also observed in real networks. Moreover, 
we also gave closed form expressions (parametric forms) for values of these statistical network 
properties, which allow us to calculate a property (e.g., diameter, eigenvalue spectrum) of a network 
directly from just the initiator matrix. So in principle, one could invert these equations and directly 
get from a property (e.g., shape of degree distribution) to the values of initiator matrix. 
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Figure 10: Fraction of nodes in the largest weakly connected component (N c /N) and the effective 
diameter for 4-star initiator graph, (a) We fix j3 = 0.15 and vary a. (b) We vary both a 
and (3. (c) Effective diameter of the network, if network is disconnected or very dense 
path lengths are short, the diameter is large when the network is barely connected. 



However, in previous sections we did not say anything about how various network properties of 
a Kronecker graph correlate and interdepend. For example, it could be the case that two network 
properties are mutually exclusive. For instance, perhaps only could only match the network diameter 
but not the degree distribution or vice versa. However, as we show later this is not the case. 

Now we turn our attention to automatically estimating the Kronecker initiator graph. The setting 
is that we are given a real network G and would like to find a Stochastic Kronecker initiator V\ that 
produces a synthetic Kronecker graph K that is "similar" to G. One way to measure similarity is to 
compare statistical network properties, like diameter and degree distribution, of graphs G and K. 

Comparing statistical properties already suggests a very direct approach to this problem: One 
could first identify the set of network properties (statistics) to match, then define a quality of fit met- 



ric an d somehow optimize over it. For example, one could use the KL divergence (IKullback and Leibler 



19511) . or the sum of squared differences between the degree distribution of the real network G and 
its synthetic counterpart K. Moreover, as we are interested in matching several such statistics be- 
tween the networks one would have to meaningfully combine these individual error metrics into a 
global error metric. So, one would have to specify what kind of properties he or she cares about 
and then combine them accordingly. This would be a hard task as the patterns of interest have very 
different magnitudes and scales. Moreover, as new network patterns are discovered, the error func- 
tions would have to be changed and models re-estimated. And even then it is not clear how to define 
the optimization procedure to maximize the quality of fit and how to perform optimization over the 
parameter space. 

Our approach here is different. Instead of committing to a set of network properties ahead 
of time, we try to directly match the adjacency matrices of the real network G and its synthetic 
counterpart K. The idea is that if the adjacency matrices are similar then the global statistical 
properties (statistics computed over K and G) will also match. Moreover, by directly working with 
the graph itself (and not summary statistics), we do not commit to any particular set of network 
statistics (network properties/patterns) and as new statistical properties of networks are discovered 
our models and estimated parameters will still hold. 
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5.1 Preliminaries 

Stochastic graph models induce probability distributions over graphs. A generative model assigns a 
probability P(G) to every graph G. P(G) is the likelihood that a given model (with a given set of 
parameters) generates the graph G. We concentrate on the Stochastic Kronecker graphs model, and 
consider fitting it to a real graph G, our data. We use the maximum likelihood approach, i.e., we 
aim to find parameter values, the initiator V\, that maximize P(G) under the Stochastic Kronecker 
graph model. 

This presents several challenges: 

• Model selection: a graph is a single structure, and not a set of items drawn independently and 
identically-distributed (i.i.d.) from some distribution. So one cannot split it into independent 
training and test sets. The fitted parameters will thus be best to generate a particular instance 
of a graph. Also, overfitting could be an issue since a more complex model generally fits 
better. 

• Node correspondence: The second challenge is the node correspondence or node labeling 
problem. The graph G has a set of A" nodes, and each node has a unique label (index, ID). 
Labels do not carry any particular meaning, they just uniquely denote or identify the nodes. 
One can think of this as the graph is first generated and then the labels (node IDs) are randomly 
assigned. This means that two isomorphic graphs that have different node labels should have 
the same likelihood. A permutation a is sufficient to describe the node correspondences 
as it maps labels (IDs) to nodes of the graph. To compute the likelihood P(G) one has to 
consider all node correspondences P(G) = P(G\a)P(a), where the sum is over all AM 
permutations a of N nodes. Calculating this super-exponential sum explicitly is infeasible 
for any graph with more than a handful of nodes. Intuitively, one can think of this summation 
as some kind of graph isomorphism test where we are searching for best correspondence 
(mapping) between nodes of G and V. 

• Likelihood estimation: Even if we assume one can efficiently solve the node correspondence 
problem, calculating P(G\a) naively takes 0(N 2 ) as one has to evaluate the probability of 
each of the A^ 2 possible edges in the graph adjacency matrix. Again, for graphs of size we 
want to model here, approaches with quadratic complexity are infeasible. 

To develop our solution we use sampling to avoid the super-exponential sum over the node 
correspondences. By exploiting the structure of the Kronecker matrix multiplication we develop an 
algorithm to evaluate P{G\a) in linear time 0{E). Since real graphs are sparse, i.e., the number of 
edges is roughly of the same order as the number of nodes, this makes fitting of Kronecker graphs 
to large networks feasible. 

5.2 Problem formulation 

Suppose we are given a graph G on N = N± nodes (for some positive integer k), and an N\ x N\ 
Stochastic Kronecker graphs initiator matrix V\. Here V\ is a parameter matrix, a set of parameters 
that we aim to estimate. For now also assume N\ , the size of the initiator matrix, is given. Later we 
will show how to automatically select it. Next, using V\ we create a Stochastic Kronecker graphs 
probability matrix Vk, where every entry p uv of Vk contains a probability that node u links to node 
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v. We then evaluate the probability that G is a realization of Vk- The task is to find such V\ that has 
the highest probability of realizing (generating) G. 
Formally, we are solving: 

argmaxP(G|^i) (3) 
Vi 

To keep the notation simpler we use standard symbol © to denote the parameter matrix V\ 
that we are trying to estimate. We denote entries of = V\ = and similarly we denote 
V = Vk = \pij]- Note that here we slightly simplified the notation: we use to refer to V\, and 0^ 
are elements of 0. Similarly, pj,- are elements of V (= Vk). Moreover, we denote K = R{V), i.e., 
K is a realization of the Stochastic Kronecker graph sampled from probabilistic adjacency matrix 
V. 

As noted before, the node IDs are assigned arbitrarily and they cany no significant information, 
which means that we have to consider all the mappings of nodes from G to rows and columns of 
stochastic adjacency matrix V. A priori all labelings are equally likely. A permutation a of the set 
{1, . . . , N} defines this mapping of nodes from G to stochastic adjacency matrix V. To evaluate the 
likelihood of G one needs to consider all possible mappings of N nodes of G to rows (columns) of 
V. For convenience we work with log-likelihood 1(0), and solve = argmaxe 1(0), where /(0) 
is defined as: 

1(0) = logP(G|0) =log^P(G|0,a)P(a|0) 

(7 

= logJ2P(G\0,(r)P(a) (4) 

cr 

The likelihood that a given initiator matrix and permutation a gave rise to the real graph G, 
P(G|0, a), is calculated naturally as follows. First, by using we create the Stochastic Kronecker 
graph adjacency matrix V = Vk = 0^- Permutation a defines the mapping of nodes of G to the 
rows and columns of stochastic adjacency matrix V. (See Figure QTJfor the illustration.) 

We then model edges as independent Bernoulli random variables parameterized by the parame- 
ter matrix 0. So, each entry p uv of V gives exactly the probability of edge (u, v) appearing. 

We then define the likelihood: 

P(G\V,a)= V[ } (1-V[a u ,a v ]), (5) 

(u,v)eG (u,v)<£G 

where we denote o~i as the i th element of the permutation a, and V[i , j] is the element at row i, 
and column j of matrix V = 0^. 

The likelihood is defined very naturally. We traverse the entries of adjacency matrix G and then 
based on whether a particular edge appeared in G or not we take the probability of edge occurring 
(or not) as given by V, and multiply these probabilities. As one has to touch all the entries of the 
stochastic adjacency matrix V evaluating Equation |5]takes 0(N 2 ) time. 

We further illustrate the process of estimating Stochastic Kronecker initiator matrix in Fig- 
ureQTJ We search over initiator matrices to find the one that maximizes the likelihood P(G\0). 
To estimate P(G\0) we are given a concrete and now we use Kronecker multiplication to create 
probabilistic adjacency matrix that is of same size as real network G. Now, we evaluate the 
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Figure 1 1 : Kronecker parameter estimation as an optimization problem. We search over the ini- 
tiator matrices G (= V\). Using Kronecker multiplication we create probabilistic ad- 
jacency matrix ©M that is of same size as real network G. Now, we evaluate the like- 
lihood by simultaneously traversing and multiplying entries of G and 0M (see Eq. [3]). 
As shown by the figure permutation a plays an important role, as permuting rows and 
columns of G could make it look more similar to 0l fc l and thus increase the likelihood. 



likelihood by traversing the corresponding entries of G and ©M. Equation [5] basically traverses 
the adjacency matrix of G, and maps every entry (u, v) of G to a corresponding entry (a u ,a v ) of 
V. Then in case that edge (u, v) exists in G (i.e., G[u, v] = 1) the likelihood that particular edge 
existing is V[a u , a v ], and similarly, in case the edge (u, v) does not exist the likelihood is simply 
1 — V[cr u ,a v ]. This also demonstrates the importance of permutation a, as permuting rows and 
columns of G could make the adjacency matrix look more "similar" to ©M, and would increase the 
likelihood. 

So far we showed how to assess the quality (likelihood) of a particular 0. So, naively one could 
perform some kind of grid search to find best 0. However, this is very inefficient. A better way of 
doing it is to compute the gradient of the log-likelihood JgZ(0), and then use the gradient to update 
the current estimate of and move towards a solution of higher likelihood. Algorithm Q] gives an 
outline of the optimization procedure. 

However, there are several difficulties with this algorithm. First, we are assuming gradient 
descent type optimization will find a good solution, i.e., the problem does not have (too many) local 
minima. Second, we are summing over exponentially many permutations in equation [4] Third, the 
evaluation of equation|5]as it is written now takes 0(N 2 ) time and needs to be evaluated AH times. 
So, given a concrete just naively calculating the likelihood takes 0(N\N 2 ) time, and then one 
also has to optimize over 0. 



Observation 2 The complexity of calculating the likelihood P(G\Q) of the graph G naively is 
0(N\N 2 ), where N is the number of nodes in G. 



Next, we show that all this can be done in linear time. 
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input : size of parameter matrix N\, graph G on N = N± nodes, and learning rate A 
output: MLE parameters 9 (Ni x Ni probability matrix) 

1 initialize Q\ 

2 while not converged do 

■> evaluate gradient: S-l(@t) 

4 update parameter estimates: &t+i = @t + ^^g - ^®*) 

5 end 

6 return 6 = <d t 

Algorithm 1: KronFit algorithm. 



5.3 Summing over the node labelings 

To maximize equation [3] using algorithm Q] we need to obtain the gradient of the log-likelihood 
JgZ(0). We can write: 

A/(«) E a ^P(G\a,e)P(a) 
d& { > J2a>P(G\a>,e)P(a>) 

p(G\e) 

a 

Note we are still summing over all NI permutations a, so calculating Eq.|6]is computationally in- 
tractable for graphs with more than a handful of nodes. However, the equation has a nice form which 
allows for use of simulation techniques to avoid the summation over super-exponentially many node 
correspondences. Thus, we simulate draws from the permutation distribution P(a\G, 0), and then 
evaluate the quantities at the sampled permutations to obtain the expected values of log-likelihood 
and gradient. Algorithm |2] gives the details. 

Note that we can also permute the rows and columns of the parameter matrix G to obtain equiv- 
alent estimates. Therefore is not strictly identifiable exactly because of these permutations. Since 
the space of permutations on ./V nodes is very large (grows as iV!) the permutation sampling algo- 
rithm will explore only a small fraction of the space of all permutations and may converge to one of 
the global maxima (but may not explore all 7Vi ! of them) of the parameter space. As we empirically 
show later our results are not sensitive to this and multiple restarts result in equivalent (but often 
permuted) parameter estimates. 



5.3.1 Sampling permutations 

Next, we describe the Metropolis algorithm to simulate draws from the permutation distribution 
P(a\G, 0), which is given by 

P((7|G ' 0) " £ T P(r,G<0) " Z 
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input : Parameter matrix 0, and graph G 




output: Log-likelihood Z(G), and gradient jjg,/(0) 


1 


iort := 1 toT do 


2 


at '■= SamplePermutation (G, 0) 


3 


l t = log P{G\a®,e) 


4 


grad t :=^logP(G\a^,Q) 


5 


end 


6 


return Z(0) = h, and ^(0) = y Ylt% rad t 



Algorithm 2: Calculating log-likelihood and gradient 



where Z is the normalizing constant that is hard to compute since it involves the sum over N\ 
elements. However, if we compute the likelihood ratio between permutations a and a' (Equation [7]) 
the normalizing constants nicely cancel out: 

iV|G,8) n yr (1-V[a u ,a v }) 

P(a\G 0) Via' a'] (I - Via' a']) 

n gyj tt (i -p^,^]) 

(u,«)6G L *' ^ \ L 1 l a u,V v \) 



This immediately suggests the use of a Metropolis sampling algorithm (iGamermanl . 1 19970 to 
simulate draws from the permutation distribution since Metropolis is solely based on such ratios 
(where normalizing constants cancel out). In particular, suppose that in the Metropolis algorithm 
(Algorithm [3]) we consider a move from permutation a to a new permutation a'. Probability of 
accepting the move to a' is given by Equation |7] (if y^rg-gj < 1) or 1 otherwise. 

Now we have to devise a way to sample permutations a from the proposal distribution. One 
way to do this would be to simply generate a random permutation a' and then check the acceptance 
condition. This would be very inefficient as we expect the distribution P(a\G, 0) to be heavily 
skewed, i.e., there will be a relatively small number of good permutations (node mappings). Even 
more so as the degree distributions in real networks are skewed there will be many bad permutations 
with low likelihood, and few good ones that do a good job in matching nodes of high degree. 

To make the sampling process "smoother", i.e., sample permutations that are not that differ- 
ent (and thus are not randomly jumping across the permutation space) we design a Markov chain. 
The idea is to stay in high likelihood part of the permutation space longer. We do this by making 
samples dependent, i.e., given a' we want to generate next candidate permutation a" to then eval- 
uate the likelihood ratio. When designing the Markov chain step one has to be careful so that the 
proposal distribution satisfies the detailed balance condition: 7r(a')P(a'\a") = ir(a")P(a"\a'), 
where P(a'\a") is the transition probability of obtaining permutation a' from a" and, 7r(a') is the 
stationary distribution. 

In Algorithm [3] we use a simple proposal where given permutation a' we generate a" by swap- 
ping elements at two uniformly at random chosen positions of a'. We refer to this proposal as 
SwapNodes. While this is simple and clearly satisfies the detailed balance condition it is also in- 
efficient in a way that most of the times low degree nodes will get swapped (a direct consequence of 
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1 



input 
output 

a (0) := 



Kronecker initiator matrix and a graph G on N nodes 
Permutation crW ~ P(a\G, 6) 

(1,...,N) 



3 



2 



i = 1 
repeat 



4 



5 



6 



Draw j and k uniformly from (1, . . . 

:= SwapNodes(o- (i-1) , j, k) 
Draw u from U(0, 1) 



N) 



i 



8 




9 end 

10 i = i + 1 

n until ffW ~ P(cr|G,G) 

12 return crW 

13 Where J7(0, 1) is a uniform distribution on [0, 1], and a' := SwapNodes ( cr, j, fe) is the 
permutation a' obtained from a by swapping elements at positions j and k. 

Algorithm 3: SamplePermutation (G, 0) : Metropolis sampling of the node permuta- 
tion. 

heavy tailed degree distributions). This has two consequences, (a) we will slowly converge to good 
permutations (accurate mappings of high degree nodes), and (b) once we reach a good permutation, 
very few permutations will get accepted as most proposed permutations a' will swap low degree 
nodes (as they form the majority of nodes). 

A possibly more efficient way would be to swap elements of a biased based on corresponding 
node degree, so that high degree nodes would get swapped more often. However, doing this directly 
does not satisfy the detailed balance condition. A way of sampling labels biased by node degrees 
that at the same time satisfies the detailed balance condition is the following: we pick an edge in G 
uniformly at random and swap the labels of the nodes at the edge endpoints. Notice this is biased 
towards swapping labels of nodes with high degrees simply as they have more edges. The detailed 
balance condition holds as edges are sampled uniformly at random. We refer to this proposal as 
SwapEdgeEndpoints. 

However, the issue with this proposal is that if the graph G is disconnected, we will only be 
swapping labels of nodes that belong to the same connected component. This means that some parts 
of the permutation space will never get visited. To overcome this problem we execute SwapNodes 
with some probability uj and SwapEdgeEndpoints with probability 1 — uj. 

To summarize we consider the following two permutation proposal distributions: 

• a" = SwapNodes (a'): we obtain a" by taking a', uniformly at random selecting a pair of 
elements and swapping their positions. 

• a" = SwapEdgeEndpoints(<r'): we obtain a" from a' by first sampling an edge (j,k) 
from G uniformly at random, then we take a' and swap the labels at positions j and k. 
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5.3.2 Speeding up the likelihood ratio calculation 

We further speed up the algorithm by using the following observation. As written the equation [7] 
takes 0(N 2 ) to evaluate since we have to consider A^ 2 possible edges. However, notice that per- 
mutations a and a' differ only at two positions, i.e. elements at position j and k are swapped, i.e., 
a and a' map all nodes except the two to the same locations. This means those elements of equa- 
tion [7] cancel out. Thus to update the likelihood we only need to traverse two rows and columns of 
matrix V, namely rows and columns j and k, since everywhere else the mapping of the nodes to the 
adjacency matrix is the same for both permutations. This gives equation [8] where the products now 
range only over the two rows/columns of V where a and a' differ. 

Graphs we are working with here are too large to allow us to explicitly create and store the 
stochastic adjacency matrix V by Kronecker powering the initiator matrix 0. Every time probability 
V[i, j] of edge is needed the equation[2]is evaluated, which takes 0{k). So a single iteration 
of Algorithm [3] takes 0(kN). 

Observation 3 Sampling a permutation a from P{a\G, 0) takes 0{kN). 

This is gives us an improvement over the 0{N\) complexity of summing over all the permuta- 
tions. So far we have shown how to obtain a permutation but we still need to evaluate the likelihood 
and find the gradients that will guide us in finding good initiator matrix. The problem here is that 
naively evaluating the network likelihood (gradient) as written in equation [6] takes time 0(N 2 ). 
This is exactly what we investigate next and how to calculate the likelihood in linear time. 

5.4 Efficiently approximating likelihood and gradient 

We just showed how to efficiently sample node permutations. Now, given a permutation we show 
how to efficiently evaluate the likelihood and it's gradient. Similarly as evaluating the likelihood 
ratio, naively calculating the log-likelihood Z(0) or its gradient ^^(0) takes time quadratic in the 
number of nodes. Next, we show how to compute this in linear time O(E). 

We begin with the observation that real graphs are sparse, which means that the number of edges 
is not quadratic but rather almost linear in the number of nodes, E -C A^ 2 . This means that majority 
of entries of graph adjacency matrix are zero, i.e., most of the edges are not present. We exploit this 
fact. The idea is to first calculate the likelihood (gradient) of an empty graph, i.e., a graph with zero 
edges, and then correct for the edges that actually appear in G. 

To naively calculate the likelihood for an empty graph one needs to evaluate every cell of graph 
adjacency matrix. We consider Taylor approximation to the likelihood, and exploit the structure of 
matrix V to devise a constant time algorithm. 

First, consider the second order Taylor approximation to log-likelihood of an edge that succeeds 
with probability x but does not appear in the graph: 



log(l — x) 



1 



2 



— X X 



2 



Calculating Z e (0), the log-likelihood of an empty graph, becomes: 




(9) 
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Notice that while the first pair of sums ranges over N elements, the last pair only ranges over 
N\ elements (Ni = log k N). Equation [9] holds due to the recursive structure of matrix V generated 
by the Kronecker product. We substitute the log(l — pi-f) with its Taylor approximation, which 
gives a sum over elements of V and their squares. Next, we notice the sum of elements of V forms 
a multinomial series, and thus J2i jPij = (Yli j @ij) k > where 9ij denotes an element of 0, and p^j 
element of 0[ fe l. 

Calculating log-likelihood of G now takes 0(E): First, we approximate the likelihood of an 
empty graph in constant time, and then account for the edges that are actually present in G, i.e., we 
subtract "no-edge" likelihood and add the "edge" likelihoods: 

Z(9) = Z e (0)+ -^g(l-V[a u ,a v ])+log(V[a u ,a v ]) 

(u,v)eG 

We note that by using the second order Taylor approximation to the log-likelihood of an empty 
graph, the error term of the approximation is |(^i% 3 ) fc > which can diverge for large k. For 
typical values of initiator matrix V\ (that we present in Section 16.51 ) we note that one needs about 
fourth or fifth order Taylor approximation for the error of the approximation actually go to zero as k 
approaches infinity, i.e., ^ . Oij n+1 < 1, where n is the order of Taylor approximation employed. 

5.5 Calculating the gradient 

Calculation of the gradient of the log-likelihood follows exactly the same pattern as described above. 
First by using the Taylor approximation we calculate the gradient as if graph G would have no edges. 
Then we correct the gradient for the edges that are present in G. As in previous section we speed 
up the calculations of the gradient by exploiting the fact that two consecutive permutations a and 
a 1 differ only at two positions, and thus given the gradient from previous step one only needs to 
account for the swap of the two rows and columns of the gradient matrix dV/d@ to update to the 
gradients of individual parameters. 



5.6 Determining the size of initiator matrix 



The question we answer next is how to determine the right number of parameters, i. e. , what is the 
right size of matrix 0? This is a classical question of model selection where there is a tradeoff 
between the complexity of the model, and the quality of the fit. Bigger model with more parameters 
usually fits better, however it is also more likely to overfit the data. 

For model selection to find the appropriate value of N\, the size of matrix 0, and choose the 
right tradeoff between the complexity o f the model and the quality of the fit, we propose to use 
the Bayes Information Criterion (BIC) (iSchwarzl . 1 1978b - Stochastic Kronecker graph model the 
presence of edges with independent Bernoulli random variables, where the canonical number of 
parameters is iVf , which is a fu nction of a lower-dimensional parameter 0. This is then a curved 
exponential family dEfronl.ll975h . and BIC naturally applies: 



BlC(iVi) = -l(B Nl ) + -Nflog(N 2 ) 



where ^ are the maximum likelihood parameters of the model with Ni x JVi parameter ma- 
trix, and N is the number of nodes in G. Note that one could also additional term to the above 
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formula to account for multiple global maxima of the likelihood space but as N\ is small the addi- 
tional term would make no real difference. 



As an alternative to BIC one could also consider the Minimum Description Length (MDL) (IRissanen . 



19781) principle where the model is scored by the quality of the fit plus the size of the description 



that encodes the model and the parameters. 



6. Experiments on real and synthetic data 

Next we described our experiments on a range of real and synthetic networks. We divide the ex- 
periments into several subsections. First we examine the convergence and mixing of the Markov 
chain of our permutation sampling scheme. Then we consider estimating the parameters of syn- 
thetic Kronecker graphs to see whether KronFit is able to recover the parameters used to generate 
the network. Last, we consider fitting Stochastic Kronecker graphs to large real world networks. 



6.1 Permutation sampling 

In our experiments we considered both synthetic and real graphs. Unless mentioned otherwise all 
synthetic Kronecker graphs were generated using V* = [0.8, 0.6; 0.5, 0.3], and k = 14 which gives 
us a graph G on N =16,384 nodes and E =115,741 edges. We chose this particular V* as it 
resembles the typical initiator for real networks analyzed later in this section. 



6.1.1 Convergence of the log-likelihood and the gradient 

First, we examine the convergence of Metropolis permutation sampling, where permutations are 
sampled sequentially. A new permutation is obtained by modifying the previous one which creates 
a Markov chain. We want to assess the convergence and mixing of the chain. We aim to determine 
how many permutations one needs to draw to reliably estimate the likelihood and the gradient, 
and also how long does it take until the samples converge to the stationary distribution. For the 
experiment we generated a synthetic Stochastic Kronecker graphs using V* as defined above. Then, 
starting with a random permutation we run Algorithm [3l and measure how the likelihood and the 
gradients converge to their true values. 

In this particular case, we first generated a Stochastic Kronecker graphs G as described above, 
but then calculated the likelihood and the parameter gradients for 6' = [0.8, 0.75; 0.45, 0.3]. We 
average the likelihoods and gradients over buckets of 1 ,000 consecutive samples, and plot how the 
log-likelihood calculated over the sampled permutations approaches the true log-likelihood (that we 
can compute since G is a Stochastic Kronecker graphs). 

First, we present experiments that aim to answer how many samples {i.e., permutations) does 
one need to draw to obtain a reliable estimate of the gradient (see Equation [6]). Figure fl2f a) 
shows how the estimated log-likelihood approaches the true likelihood. Notice that estimated values 
quickly converge to their true values, i.e., Metropolis sampling quickly moves towards "good" per- 
mutations. Similarly, Figure [T2l b) plots the convergence of the gradients. Notice that 8\\ and 822 of 
6' and V* match, so gradients of these two parameters should converge to zero and indeed they do. 
On the other hand, 612 and 821 differ between 6' and V\ . Notice, the gradient for one is positive 
as the parameter 8\2 of 0' should be decreased, and similarly for 821 the gradient is negative as the 
parameter value should be increased to match the 0'. In summary, this shows that log-likelihood 
and gradients rather quickly converge to their true values. 
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Figure 12: Convergence of the log-likelihood and components of the gradient towards their true 
values for Metropolis permutation sampling (Algorithm with the number of samples. 



In Figures ITH c) and (d) we also investigate the properties of the Markov Chain Monte Carlo 
sampling procedure, and assess convergence and mixing criteria. First, we plot the fraction of 
accepted proposals. It stabilizes at around 15%, which is quite close to the rule-of-a-thumb of 
25%. Second, Figure [T2"ld) plots the autocorrelation of the log-likelihood as a function of the lag. 
Autocorrelation of a signal X is a function of the lag k where is defined as the correlation 
of signal X at time t with X at t + k, i.e., correlation of the signal with itself at lag k. High 
autocorrelations within chains indicate slow mixing and, usually, slow convergence. On the other 
hand fast decay of autocorrelation implies better the mixing and thus one needs less samples to 
accurately estimate the gradient or the likelihood. Notice the rather fast autocorrelation decay. 

All in all, these experiments show that one needs to sample on the order of tens of thousands 
of permutations for the estimates to converge. We also verified that the variance of the estimates is 
sufficiently small. In our experiments we start with a random permutation and use long burn-in time. 
Then when performing optimization we use the permutation from the previous step to initialize the 
permutation at the current step of the gradient descent. Intuitively, small changes in parameter space 
also mean small changes in P(a\G, 0) . 
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Figure 13: Convergence of the log-likelihood and gradients for Metropolis permutation sampling 
(Algorithm[3]) for different choices of u that interpolates between the SwapNodes (lo = 
1) and SwapEdgeEndpoints (lo = 0) permutation proposal distributions. Notice 
fastest convergence of log-likelihood for lo = 0.6. 



6.1.2 Different proposal distributions 

In section 15". 3. II we defined two permutation sampling strategies: SwapNodes where we pick two 
nodes uniformly at random and swap their labels (node ids), and SwapEdgeEndpoints where 
we pick a random edge in a graph and then swap the labels of the edge endpoints. We also discussed 
that one can interpolate between the two strategies by executing SwapNodes with probability lo, 
and SwapEdgeEndpoints with probability 1 — lo. 

So, given a Stochastic Kronecker graphs G on N =16,384 and E =115,741 generated from 
V* = [0.8, 0.7; 0.5, 0.3] we evaluate the likelihood of G' = [0.8, 0.75; 0.45, 0.3]. As we sample 
permutations we observe how the estimated likelihood converges to the true likelihood. Moreover 
we also vary parameter lo which interpolates between the two permutation proposal distributions. 
The quicker the convergence towards the true log-likelihood the better the proposal distribution. 

Figure [T3]plots the convergence of the log-likelihood with the number of sampled permutations. 
We plot the average over non-overlapping buckets of 1 ,000 consecutive permutations. Faster con- 
vergence implies better permutation proposal distribution. When we use only SwapNodes (lo = 1) 
or SwapEdgeEndpoints (lo = 0) convergence is rather slow. We obtain best convergence for lo 
around 0.6. 

Similarly, Figure l\4\ a) plots the autocorrelation as a function of the lag k for different choices 
of lo. Faster autocorrelation decay means better mixing of the Markov chain. Again, notice that we 
get best mixing for lo 0.6. (Notice logarithmic y-axis.) 

Last, we diagnose how long the sampling procedure must be run before the generated samples 
can be considered to be drawn (approximately) from the stationary distribution. We call this the 
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Figure 14: (a) Autocorrelation plot of the log-likelihood for the different choices of parameter uj. 

Notice we get best mixing with u m 0.6. (b) The potential scale reduction that com- 
pares the variance inside- and across- independent Markov chains for different values of 
parameter u. 



burn-in time of the chain. There are various procedur es for assessing convergence. Here we adopt 



the approach of Gelman et al. (IGelman et all 120031) . that is based on running multiple Markov 



chains each from a different starting point, and then comparing the variance within the chain and 
between the chains. The sooner the within- and between-chain variances become equal the shorter 
the burn-in time, i.e., the sooner the samples are drawn from the stationary distribution. 

Ik) 

Let I be the parameter that is being simulated with J different chains, and then let h denote 
the k th sample of the j th chain, where j = 1, . . . , J and k = 1, . . . , K. More specifically, in our 

(k) 

case we run separate permutation sampling chains. So, we first sample permutation cr and then 

(k) 

calculate the corresponding log-likelihood <j ; . 

First, we compute between and within chain variances a 2 B and &^y, where between-chain vari- 
ance is obtained by 

where I, = * £f =1 /f and I. = J £/=i l.j 

Similarly the within-chain variance is defined by 



Then, the marginal posterior variance of I is calculated using 
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Figure 15: (a) Distribution of log-likelihood of permutations sampled uniformly at random, and 
(b) when sampled from P(er|0, G). Notice the space of good permutations is rather 
small but our sampling quickly finds permutations of high likelihood, (c) Convergence 
of log-likelihood for 10 runs of gradient descent, each from a different random starting 
point. 



And, finally, we estimate the potential scale reduction (IGelman et all 120031) of I by 



Note that as the length of the chain K 




R c onverg es to 1 from above. The recommen- 



dation for convergence assessment from (IGelman et all 120031 ) is that the potential scale reduction 
is below 1.2. 

Figure [14T b) gives the Gelman-Rubin-Brooks plot, where we plot the potential scale reduction 

V~R over the increasing chain length K for different choices of parameter u. Notice that the po- 
tential scale reduction quickly decays towards 1. Similarly as in Figure [14] the extreme values of u 
give slow decay, while we obtain the fastest potential scale reduction when u « 0.6. 



6.1.3 Properties of the permutation space 

Next we explore the properties of the permutation space. We would like to quantify what fraction 
of permutations are "good" (have high likelihood), and how quickly are they discovered. For the 
experiment we took a real network G (As-ROUTEVlEWS network) and the MLE parameters 6 for 
it that we estimated before hand (7(B) ~ —150,000). The network G has 6,474 nodes which means 
the space of all permutations has « 10 22,000 elements. 

First, we sampled 1 billion (10 9 ) permutations <7j uniformly at random, i.e., P(o~i) = 1/(6,474!) 
and for each evaluated its log-likelihood l(ai\Q) = log P(0\G, crj). We ordered the permutations in 
deceasing log-likelihood and plotted l(ai\0) vs. rank. Figure [131 a) gives the plot. Notice that very 
few random permutations are very bad (i.e., they give low likelihood), similarly few permutations 
are very good, while most of them are somewhere in between. Notice that best "random" permuta- 
tion has log-likelihood of w —320,000, which is far below true likelihood 1(0) w —150,000. This 
suggests that only a very small fraction of all permutations gives good node labelings. 
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On the other hand, we also repeated the same experiment but now using permutations sampled 
from the permutation distribution <jj ~ P(a\@,G) via our Metropolis sampling scheme. Fig- 
ure [T5jb) gives the plot. Notice the radical difference. Now the l(a\@i) very quickly converges to 
the true likelihood of w —150,000. This suggests that while the number of "good" permutations 
(accurate node mappings) is rather small, our sampling procedure quickly converges to the "good" 
part of the permutation space where node mappings are accurate, and spends the most time there. 

6.2 Properties of the optimization space 

In maximizing the likelihood we use a stochastic approximation to the gradient. This adds variance 
to the gradient and makes efficient optimization techniques, like conjugate gradient, highly unstable. 
Thus we use gradient descent, which is slower but easier to control. First, we make the following 
observation: 

Observation 4 Given a real graph G then finding the maximum likelihood Stochastic Kronecker 
initiator matrix 

= arg max P(G[0) 

is a non-convex optimization problem. 

Proof By definition permutations of the Kronecker graphs initiator matrix all have the same 
log-likelihood. This means that we have several global minima that correspond to permutations of 
parameter matrix 0, and then between them the log-likelihood drops. This means that the optimiza- 
tion problem is non-convex. ■ 

The above observation does not seem promising for estimating using gradient descent as it 
is prone to finding local minima. To test for this behavir we run the following experiment: we 
generated 100 synthetic Kronecker graphs on 16,384 (2 14 ) nodes and 1.4 million edges on the 
average, each with a randomly chosen 2x2 parameter matrix 0*. For each of the 100 graphs 
we run a single trial of gradient descent starting from a random parameter matrix 0', and try to 
recover 0*. In 98% of the cases the gradient descent converged to the true parameters. Many 
times the algorithm converged to a different global minima, i.e., is a permuted version of original 
parameter matrix 0*. Moreover, the median number of gradient descent iterations was only 52. 

This suggests surprisingly nice structure of our optimization space: it seems to behave like a 
convex optimization problem with many equivalent global minima. Moreover, this experiment is 
also a good sanity check as it shows that given a Kronecker graph we can recover and identify the 
parameters that were used to generate it. 

Moreover, Figure [Etc) plots the log-likelihood Z(0 t ) of the current parameter estimate Q t over 
the iterations t of the stochastic gradient descent. We plot the log-likelihood for 10 different runs 
of gradient descent, each time starting from a different random set of parameters 0q. Notice that in 
all runs gradient descent always converges towards the optimum, and none of the runs gets stuck in 
some local maxima. 

6.3 Convergence of the graph properties 

We approached the problem of estimating Stochastic Kronecker initiator matrix by defining the 
likelihood over the individual entries of the graph adjacency matrix. However, what we would really 
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Figure 16: Convergence of graph properties with the number of iterations of gradient descent using 
the synthetic dataset. We start with a random choice of parameters and with steps of 
gradient descent the Kronecker graph better and better matches network properties of 
the target graph. 



like is to be given a real graph G and then generate a synthetic graph K that has similar properties 
as the real G. By properties we mean network statistics that can be computed from the graph, e.g., 
diameter, degree distribution, clustering coefficient, etc. A priori it is not clear that our approach 
which tries to match individual entries of graph adjacency matrix will also be able to reproduce 
these global network statistics. However, as show next this is not the case. 

To get some understanding of the convergence of the gradient descent in terms of the network 
properties we performed the following experiment. After every step t of stochastic gradient descent, 
we compare the true graph G with the synthetic Kronecker graph K t generated using the current 
parameter estimates t . FigurefToTa) gives the convergence of log-likelihood, and (b) gives absolute 
error in parameter values \9ij — 0*j\, where 9ij £ ©i, and Q\- G 0*). Similarly, Figure [TBT c) 
plots the effective diameter, and (d) gives the largest singular value of graph adjacency matrix K as 
it converges to largest singular value of G. 

Note how with progressing iterations of gradient descent properties of graph K t quickly con- 
verge to those of G even though we are not directly optimizing the similarity in network properties: 
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Figure 17: Autonomous Systems (As-ROUTEVlEWS): Overlayed patterns of real graph and the 
fitted Kronecker graph. Notice that the fitted Kronecker graph matches patterns of the 
real graph while using only four parameters (2x2 initiator matrix). 



log-likelihood increases, absolute error of parameters decreases, diameter and largest singular value 
of K t both converge to G. This is a nice result as it shows that through maximizing the likelihood 
the resulting graphs become more and more similar also in their structural properties (even though 
we are not directly optimizing over them). 



6.4 Fitting to real-world networks 

Next, we present experiments of fitting Kronecker graph model to real-world networks. Given a real 
network G we aim to discover the most likely parameters that ideally would generate a synthetic 
graph K having similar properties as real G. This assumes that Kronecker graphs are a good model 
of the network structure, and that KronFit is able to find good parameters. In previous section 
we showed that KronFit can efficiently recover the parameters. Now we examine how well can 
Kronecker graph model the structure of real networks. 

We consider several different networks, like a graph of connectivity among Internet Autonomous 
systems (As-Route Views) with iV =6,474 and E =26,467 a who-trusts-whom type social net- 



work from Epinions (IRichardson et all 120031) (EPINIONS) with N =75,879 and E =508,960 and 
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many others. The largest network we consider for fitting is Flickr photo-sharing online social 
network with 584,207 nodes and 3,555,115 edges. 

For the purpose of this section we take a real network G, find parameters 6 using KronFit, 
generate a synthetic graph K using 0, and then compare G and K by comparing their properties 
that we introduced in section [2] In all experiments we started from a random point (random initiator 
matrix) and run gradient descent for 100 steps. At each step we estimate the likelihood and the 
gradient based on 510,000 sampled permutations where we discard first 10,000 samples to allow 
the chain to burn-in. 



6.4.1 Fitting to Autonomous Systems network 



First, we focus on the Autonomou s Systems network obtained from the University of Oregon Route 
Views project ( Route ViewsL 1997 ). Given the AS network G we run KronFit to obtain parameter 



estimates 0. Using the we then generate a synthetic Kronecker graph K, and compare the 
properties of G and K. 

Figure [FT] shows properties of As-RouteViews, and compares them with the properties of a 
synthetic Kronecker graph generated using the fitted parameters of size 2x2. Notice that proper- 
ties of both graphs match really well. The estimated parameters are = [0.987, 0.571; 0.571, 0.049]. 

Figure [171 a) compares the degree distributions of the As-ROUTEVlEWS network and its syn- 
thetic Kronecker estimate. In this and all other plots we use the exponential binning which is a 
standard procedure to de-noise the data when plotting on log-log scales. Notice a very close match 
in degree distribution between the real graph and its synthetic counterpart. 

Figure WTi h) plots the cumulative number of pairs of nodes g(h) that can be reached in < h 
hops. The hop plot gives a sense about the distribution of the shortest path lengths in the network 
and about the network diameter. Last, Figures lYJ\ c) and (d) plot the spectral properties of the graph 
adjacency matrix. Figure [r77 c) plots largest singular values vs. rank, and (d) plots the components 
of left singular vector (the network value) vs. the rank. Again notice the good agreement with the 
real graph while using only four parameters. 

Moreover, on all plots the error bars of two standard deviations show the variance of the graph 
properties for different realizations R(Q^). To obtain the error bars we took the same 0, and 
generated 50 realizations of a Kronecker graph. As for the most of the plots the error bars are so 
small to be practically invisible, this shows that the variance of network properties when generating 
a Stochastic Kronecker graph is indeed very small. 

Also notice that the As-RouteViews is an undirected graph, and that the fitted parameter 
matrix is in fact symmetric. This means that without a priori biasing the fitting towards undi- 
rected graphs, the recovered parameters obey this aspect of the network. Fitting As-RouteViews 
graph from a random set of parameters, performing gradient descent for 100 iterations and at each 
iteration sampling half a million perm utations, took less tha n 10 minutes on a standard desktop 



PC. This is a significant speedup over (IBezakova et all 120061) . where by using a similar permuta 



tion sampling approach for calculating the likelihood of a preferential attachment model on similar 
As-RouteViews graph took about two days on a cluster of 50 machines. 



6.4.2 Choice of the initiator matrix size Ni 

As mentioned earlier for finding the optimal number of parameters, i.e. , selecting the size of initiator 
matrix, BIC criterion naturally applies to the case of Kronecker graphs. Figure l23l b) shows BIC 
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Ni 


i(e) 






|{deg(u) > 0}| 


BIC score 


2 


-152,499 


8,192 


25,023 


5,675 


152,506 


3 


-127,066 


6,561 


28,790 


5,683 


127,083 


4 


-153,260 


16,384 


24,925 


8,222 


153,290 


5 


-149,949 


15,625 


29,111 


9,822 


149,996 


6 


-128,241 


7,776 


26,557 


6,623 


128,309 


As-RouteViews 


26,467 


6,474 





Table 2: Log-likelihood at MLE for different choices of the size of the initiator matrix Ni for 
the As-ROUTEVlEWS graph. Notice the log4ikelihood 1(6) generally increases with the 
model complexity N\. Also notice the effect of zero-padding, i.e., for N\ = 4 and N\ = 5 
the constraint of the number of nodes being an integer power of Ni decreases the log- 
likelihood. However, the column |{deg(-u) > 0}| gives the number of non-isolated nodes 
in the network which is much less than N£ and is in fact very close to the true number of 
nodes in the As-RouteViews. Using the BIC scores we see that Ni = 3 or Ni = 6 are 
best choices for the size of the initiator matrix. 



scores for the following experiment: We generated Kronecker graph with N =2,187 and E =8,736 
using N\ = 3 (9 parameters) and k = 7. For 1 < Ni < 9 we find the MLE parameters using 
gradient descent, and calculate the BIC scores. The model with the lowest score is chosen. As 
figure l23l b) shows we recovered the true model, i.e., BIC score is the lowest for the model with the 
true number of parameters, Ni = 3. 

Intuitively we expect a more complex model with more parameters to fit the data better. Thus 
we expect larger N\ to generally give better likelihood. On the other hand the fit will also depend on 
the size of the graph G. Kronecker graphs can only generate graphs on nodes, while real graphs 
do not necessarily have nodes (for some, preferably small, integers Ni and k). To solve this 
problem we choose k so that N^ 1 < N(G) < N*, and then augment G by adding N± — N isolated 
nodes. Or equivalently, we pad the adjacency matrix of G with zeros until it is of the appropriate 
size, iV-f x iV-f . While this solves the problem of requiring the integer power of the number of nodes 
it also makes the fitting problem harder as when N <C iV-f we are basically fitting G plus a large 
number of isolated nodes. 

Table [2] shows the results of fitting Kronecker graphs to As-RouteViews while varying the 
size of the initiator matrix N\. First, notice that in general larger 2Vi results in higher log-likelihood 
1(0) at MLE. Similarly, notice (column iV-f ) that while As-ROUTEVlEWS has 6,474 nodes, Kro- 
necker estimates have up to 16,384 nodes (16,384= 4 7 , which is the first integer power of 4 greater 
than 6,474). However, we also show the number of non-zero degree (non-isolated) nodes in the 
Kronecker graph (column |{deg(u) > 0}|). Notice that the number of non-isolated nodes well 
corresponds to the number of nodes in As-ROUTEVlEWS network. This shows that KronFit is 
actually fitting the graph well and it successfully fits the structure of the graph plus a number of 
isolated nodes. Last, column E\ gives the number of edges in the corresponding Kronecker graph 
which is close to the true number of edges of the As-RouteViews graph. 

Last, comparing the log-likelihood at the MLE and the BIC score in Table |2] we notice that the 
log-likelihood heavily dominates the BIC score. This means that the size of the initiator matrix 
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Figure 18: 3 by 3 Stochastic Kronecker graphs: Given a Stochastic Kronecker graphs G generated 
from N\ = 3 (red curve), we fit a Kronecker graph K' with N[ = 2 (green) and K" 
with N'l = 3 (blue). Not surprisingly K" fits the properties of K perfectly as the model 
is the of same complexity. On the other hand K' has only 4 parameters (instead of 9 as 
in K and K") and still fits well. 



(number of parameters) is so small that overfitting is not a concern. Thus we can just choose the 
initiator matrix that maximizes the likelihood. A simple calculation shows that one would need 
to take initiator matrices with thousands of entries before the model complexity part of BIC score 
would start to play a significant role. 

We further examine the sensitivity of the choice of the initiator size by the following experiment. 
We generate a Stochastic Kronecker graphs K on 9 parameters {N\ = 3), and then fit a Kronecker 
graph K' with a smaller number of parameters (4 instead of 9, iV{ = 2). And also a Kronecker 
graph K" of the same complexity as K (N" = 3). 

Figure [l8]plots the properties of all three graphs. Not suiprisingly K" (blue) fits the properties 
of K (red) perfectly as the initiator is of the same size. On the other hand K' (green) is a simpler 
model with only 4 parameters (instead of 9 as in K and K") and still generally fits well: hop plot 
and degree distribution match well, while spectral properties of graph adjacency matrix, especially 
scree plot, are not matched that well. This shows that nothing drastic happens and that even a bit 
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Snapshot at time 


N 


E 


i(e) 


Estimates at MLE, 6 


Ti 


2,048 


8,794 


-40,535 


[0.981, 0.633; 0.633, 0.048] 


T 2 


4,088 


15,711 


-82,675 


[0.934, 0.623; 0.622, 0.044] 


T 3 


6,474 


26,467 


-152,499 


[0.987, 0.571; 0.571, 0.049] 



Table 3: Parameter estimates of the three temporal snapshots of the As-ROUTEVlEWS network. 
Notice that estimates stay remarkably stable over time. 



too simple model still fits the data well. In general we observe empirically that by increasing the 
size of the initiator matrix one does not gain radically better fits for degree distribution and hop plot. 
On the other hand there is usually an improvement in the scree plot and the plot of network values 
when one increases the initiator size. 

6.4.3 Network parameters over time 

Next we briefly examine the evolution of the Kronecker initiator for a temporally evolving graph. 
The idea is that given parameter estimates of a real-graph G\ at time t, we can forecast the future 
structure of the graph Gt+ X at time t + x, i.e., using parameters obtained from Gt we can generate 
a larger synthetic graph K that will be similar to Gt+ X - 

As we have the information about the evolution of the As-RouteViews network, we esti- 
mated parameters for three snapshots of the network when it had about 2 k nodes. Table [3] gives the 
results of the fitting for the three temporal snapshots of the As-RouteViews network. Notice the 
parameter estimates remain remarkably stable over time. This means that Kronecker graphs can 
be used to estimate the structure of the networks in the future, i.e., parameters estimated from the 
historic data can extrapolate the graph structure in the future. 

Figure [191 further explores this. It overlays the graph properties of the real As-RouteViews 
network at time T3 and the synthetic graphs for which we used the parameters obtained on historic 
snapshots of As-RouteViews at times T\ and T 2 . The agreements are good which demonstrates 
that Kronecker graphs can forecast the structure of the network in the future. 

Moreover, this experiments also shows that parameter estimates do not suffer much from the 
zero padding of graph adjacency matrix (i.e., adding isolated nodes to make G have N k nodes). 
Snapshots of As-ROUTEVlEWS at T\ and T 2 have close to 2 k nodes, while we had to add 26% 
(1,718) isolated nodes to the network at T3 to make the number of nodes be 2 k . Regardless of 
this we see the parameter estimates 6 remain basically constant over time, which seems to be 
independent of the number of isolated nodes added. This means that the estimated parameters are 
not biased too much from zero padding the adjacency matrix of G. 

6.5 Fitting to other large real-world networks 

Last, we present results of fitting Stochastic Kronecker graphs to 20 large real-world networks: 
large online social networks, like EPINIONS, Flickr and DELICIOUS, web and blog graphs (Web- 
Notredame, Blog-nat05-6m, Blog-nat06all), internet and peer-to-peer networks (As-Newman, 
Gnutella-25, Gnutella- 30), collaboration networks of co-authorships from DBLP (C A-DBLP) 
and various areas of physics (CA-HEP-TH, CA-HEP-PH, CA-GR-QC), physics citation networks 
(Cit-hep-ph, Cit-hep-th), an email network (Email-Inside), a protein interaction network 
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Figure 19: Autonomous systems network over time (As-ROUTEVlEWS): Overlayed patterns of real 
As-RouteViews network at time T3 and the Kronecker graphs with parameters esti- 
mated from As-ROUTEVlEWS at time T\ and T 2 . Notice good fits which means that 
parameters estimated on historic snapshots can be used to estimate the graph in the fu- 
ture. 



Bio-Proteins, and a bipartite affiliation network (authors-to-papers, AtP-gr-QC). Refer to ta- 
ble [5] in the appendix for the description and basic properties of these networks. They are available 
for download at |http : / /snap . Stanford. edu| 

For each dataset we started gradient descent from a random point (random initiator matrix) and 
ran it for 100 steps. At each step we estimate the likelihood and the gradient based on 510,000 
sampled permutations where we discard first 10,000 samples to allow the chain to burn-in. 

Table @] gives the estimated parameters, the corresponding log-likelihoods and the wall clock 
times. All experiments were carried out on standard desktop computer. Notice that the estimated 
initiator matrices seem to have almost universal structure with a large value in the top left entry, 
a very small value at the bottom right corner and intermediate values in the other two corners. We 
further discuss the implications of such structure of Kronecker initiator matrix on the global network 
structure in next section. 

Last, Figures [20] and ED show overlays of various network properties of real and the estimated 
synthetic networks. In addition to the network properties we plotted in Figure [TU we also sepa- 
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Gnutella-25 


22,687 


54,705 
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-530,199 
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Table 4: Results of parameter estimation for 20 different networks. Table [5] gives the description 
and basic properties of the above network datasets. Networks are available for download 

at |http : / /snap . Stanford . eclu| 



rately plot in- and out-degree distributions (as both networks are directed) and plot the node triangle 
participation in panel (c), where we plot the number of triangles a node participates in versus the 
number of such nodes. (Again the error bars show the variance of network properties over different 
realizations R(Q^) of a Stochastic Kronecker graph.) 

Notice that for both networks and in all cases the properties of the real network and the synthetic 
Kronecker coincide really well. Using Stochastic Kronecker graphs with just 4 parameters we match 
the scree plot, degree distributions, triangle participation, hop plot and network values. 

Given the previous experiments from the Autonomous systems graph we only present the results 
for the simplest model with initiator size A^i = 2. Empirically we also observe that N\ = 2 gives 
surprisingly good fits and the estimation procedure is the most robust and converges the fastest. 
Using larger initiator matrices > 2 generally helps improve the likelihood but not dramatically. 
In terms of matching the network properties we also gent a slight improvement by making the model 
more complex. Figure [22] gives the percent improvement in log-likelihood as we make the model 
more complex. We use the log-likelihood of a 2 x 2 model as a baseline and estimate the log- 
likelihood at the MLE for larger initiator matrices. Again, models with more parameters tend to 
fit better. However, sometimes due to zero-padding of a graph adjacency matrix they actually have 
lower log-likelihood (as for example seen in Table [2]). 
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Figure 20: Blog network (Blog-nat06all): Overlayed patterns of real network and the estimated 
Kronecker graph using 4 parameters (2x2 initiator matrix). Notice that the Kronecker 
graph matches all properties of the real network. 



6.6 Scalability 

Last we also empirically evaluate the scalability of the KronFit. The experiment confirms that 
KronFit runtime scales linearly with the number of edges E in a graph G. More precisely, we 
performed the following experiment. 

We generated a sequence of increasingly larger synthetic graphs on N nodes and 8./V edges, 
and measured the time of one iteration of gradient descent, i.e., sample 1 million permutations and 
evaluate the gradients. We stalled with a graph on 1,000 nodes, and finished with a graph on 8 
million nodes, and 64 million edges. Figure l23l a) shows KronFit scales linearly with the size of 
the network. We plot wall-clock time vs. size of the graph. The dashed line gives a linear fit to the 
data points. 



7. Discussion 



Here we discuss several of the desirable properties of the proposed Kronecker graphs. 

Generality: Stochastic Kronecker graphs include several other generators as special cases: For 
9ij = c, we obtain the classical Erdos-Renyi random graph model. For 6ij £ {0, 1}, we obtain 
a determi nistic Kronecker g r aph. Setting the Ki matrix to a 2 x 2 matrix, we obtain the R-MAT 
generator (IChakrabarti et al.l.l2004r) . In contrast to Kronecker graphs, the RMAT cannot extrapolate 
into the future, since it needs to know the number of edges to insert. Thus, it is incapable of obeying 
the densification power law. 
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Figure 21: EPINIONS who-trusts-whom social network: Overlayed patterns of real network and 
the fitted Kronecker graph using only 4 parameters (2x2 initiator matrix). Again, the 
synthetic Kronecker graph matches all the properties of the real network. 



1960 



Phase transition phenomena: The Erdos-Renyi graphs exhibit phase transitions (lErdos and Renyil. 
S everal researchers argue that re al systems are "at the edge of chaos" or phase transi- 



tion dBakl.fl996l:ISole and Goodwinl lioooh. Stochastic Kronecker graphs also exhibit phase transi 
tions (|Mahdian and Xul 2007 ) for the emergence of the giant component and another phase transi- 
tion for connectivity. 

Implications to the structure of the large-real networks: Empirically we found that 2x2 
initiator (N\ = 2) fits well the properties of real-world networks. Moreover, given a 2 x 2 initiator 
matrix, one can look at it as a recursive expansion of two groups into sub-groups. We introduced 
this recursive view of Kronecker graphs back in section [3] So, one can then interpret the diagonal 
values of as the proportion of edges inside each of the groups, and the off-diagonal values give 
the fraction of edges connecting the groups. Figure |24] illustrates the setting for two groups. 



For example, as shown in Figure [241 large a, d and small b, c would imply that the network is 
composed of hierarchically nes ted communities, where there are many edges inside each community 
and few edges crossing them ( Leskovec . 2009T) . One could think of this structure as some kind 
of organizational or university hierarchy, where one expects the most friendships between people 
within same lab, a bit less between people in the same department, less across different departments, 
and the least friendships to be formed across people from different schools of the university. 

However, parameter estimates for a wide range of networks presented in Table|4]suggests a very 
different picture of the network structure. Notice that for most networks a > i > c > d Moreover, 
o«l,i«cw 0.6 and d « 0.2. We empirically observed that the same structure of initiator matrix 
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Figure 22: Percent improvement in log-likelihood over the 2x2 model as we increase the model 
complexity (size of initiator matrix). In general larger initiator matrices that have more 
degrees of freedom help improving the fit of the model. 



also holds when fitting 3 x 3 or 4 x 4 models. Always the top left element is the largest and then 
the values on the diagonal decay faster than off the diagonal (|Leskoved . l2009f) . 



T his suggests a network struc ture which is also known as core-perip hery (|Borgatti an d Everett, 



2000 : iHolmel l2005r) . the jellyfish (ITauro et aUl200ll : ISiganos et al.Ll2006h . or the octopus dChung and Lu , 
2006) structure of the network as illustrated in Figure l24f c). 

All of the above basically say that the network is composed of a densely linked network core 
and the periphery. In our case this would imply the following structure of the initiator matrix. The 
core is modeled by parameter a and the periphery by d. Most edges are inside the core (large a), and 
very few between the nodes of periphery (small d). Then t here are many m ore edges between the 
core and the periphery than inside the periphery (b, c > d) (|Leskoved. I2009T ). This is exactly what 
we see as well. And in spirit of Kronecker graphs the structure repeats recursively — the core again 
has the dense core and the periphery, and so on. And similarly the periphery itself has the core and 
the periphery. 

This suggest an "onion" like nested core-periphery ( Leskovec et al. . 2008allb|) network structure 
as illustrated in Figure 1247 c). where the network is composed of denser and denser layers as one 
moves towards the center of the network. We also observe similar structure of the Kronecker ini- 
tiator when fitting 3 x 3 or 4 x 4 initiator matrix. The diagonal elements have large but decreasing 
values with off diagonal elements following same decreasing pattern. 
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Figure 23: (a) Processor time to sample 1 million gradients as the graph grows. Notice the algo- 
rithm scales linearly with the graph size, (b) BIC score for model selection. 





(a) 2 x 2 initiator matrix (b) Two recursive communities 



(c) Core-periphery 



Figure 24: 2x2 Kronecker initiator matrix (a) can be thought of as two communities where there 
are a and d edges inside each of the communities and b and c edges crossing the two 
communities as illustrated in (b). Each community can then be recursively divided using 
the same pattern, (c) The onion like core-periphery structure where the network gets 
denser and denser as we move towards the center of the network. 



One of the implications of this is that networks do not break nicely into hierarchically orga- 
nized sets of communities that lend themselves to graph partitioning and community detection al- 
gorithms. On contrary, this suggests that large networks can be decomposed into a densely linked 
core wi th many small peripher y pieces hanging off the core. This is in accordance with our recent 
results (ILeskovec et all l2008al Jbh. that make similar observation (but based on a completely differ- 
ent methodology based on graph partitioning) about the clustering and community structure of large 
real-world networks. 
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8. Conclusion 

In conclusion, the main contribution of this work is a family of models of network structure that 
uses a non-traditional matrix operation, the Kronecker product. The resulting graphs (a) have all 
the static properties (heavy-tailed degree distribution, small diameter, etc.), (b) all the temporal 
properties (densification, shrinking diameter) that are found in real networks. And in addition, (c) 
we can formally prove all of these properties. 

Several of the proofs are extremely simple, thanks to the rich theory of Kronecker multiplication. 
We also provide proofs about the diameter and effective diameter, and we show that Stochastic 
Kronecker graphs can mimic real graphs well. 

Moreover, we also presented KronFit, a fast, scalable algorithm to estimate Stochastic Kro- 
necker initiator, which can be then used to create a synthetic graph that mimics the properties of a 
given real network. 

In contrast to earlier work, our work has the following novelties: (a) it is among the few that 
estimates the parameters of the chosen generator in a principled way, (b) it is among the few that 
has a concrete measure of goodness of the fit (namely, the likelihood), (c) it avoids the quadratic 
complexity of computing the likelihood by exploiting the properties of the Kronecker graphs, and 
(d) it avoids the factorial explosion of the node correspondence problem, by using the Metropolis 
sampling. 

The resulting algorithm matches well all the known properties of real graphs, as we show with 
the Epinions graph and the AS graph, it scales linearly on the number of edges, and it is orders of 
magnitudes faster than earlier g raph-fitting attempts: 2 minutes on a commodity PC, versus 2 days 



on a cluster of 50 workstations (Bezako va et all 12006) 



The benefits of fitting a Kronecker graph model into a real graph are several: 

• Extrapolation: Once we have the Kronecker generator G for a given real matrix G (such that 
G is mimicked by 0^), a larger version of G can be generated by 0[ fc + 1 l. 

• Null-model: When analyzing a real network G one often needs to asses the significance of 
the observation. @^ that mimics G can be used as an accurate model of G. 

• Network structure: Estimated parameters give insight into the global network and community 
structure of the network. 

• Forecasting: As we demonstrated one can obtain from a graph G t at time t such that G is 
mimicked by 0^. Then can be used to model the structure of Gt+ X in the future. 

• Sampling: Similarly, if we want a realistic sample of the real graph, we could use a smaller 
exponent in the Kronecker exponentiation, like 6[ fc-1 J. 

• Anonymization: Since 

Q[k] 

mimics G, we can publish Q^ k \ without revealing information 
about the nodes of the real graph G. 

Future work could include extensions of Kronecker graphs to evolving networks. We envision 
formulating a dynamic Bayesian network with fust order Markov dependencies, where parameter 
matrix at time t depends on the graph Gt at current time t and the parameter matrix at time t — 1. 
Given a series of network snapshots one would then aim to estimate initiator matrices at individual 
time steps and the parameters of the model governing the evolution of the initiator matrix. We 
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expect that based on the evolution of initiator matrix one would gain greater insight in the evolution 
of large networks. 

Second direction for f uture work is to explore connec t ions between Kronecker graphs and Ran- 
dom Dot Product graphs (|Young and Scheinermanl . 120071 : lNickelLl2008h . This also nicely connects 
with the "attribute view" of Kronecker graphs as described in Section 1331 It would be interesting 
to design methods to estimate the individual node attribute values as well as the attribute-attribute 
similarity matrix (i.e., the initiator matrix). As for some networks node attributes are already given 
one could then try to infer "hidden" or missing node attribute values and this way gain insight into 
individual nodes as well as individual edge formations. Moreover, this would be interesting as one 
could further evaluate how realistic is the "attribute view" of Kronecker graphs. 

Last, we also mention possible extensions of Kronecker graphs for modeling weighted and 
labeled networks. Currently Stochastic Kronecker graphs use a Bernoulli edge generation model, 
i.e., an entry of big matrix V encodes the parameter of a Bernoulli coin. In similar spirit one could 
consider entries of V to encode parameters of different edge generative processes. For example, to 
generate networks with weights on edges an entry of V could encode the parameter of an exponential 
distribution, or in case of labeled networks one could use several initiator matrices in parallel and 
this way encode parameters of a multinomial distribution over different node attribute values. 
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Appendix A. Table of networks 

Table [5] lists all the network datasets that were used in this paper. We also computed some of the 

structural network properties. Most of the networks are available for download from |http : / / snap . Stanford. edu| 
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Yahoo! Answers social network (Leskovec et al., 2008a) 

Idel . icio . usl social network (Les kovec et al. . 2008a) 

European research organization email network (Leskovec et al.. 2007a) 

Who-trusts-whom graph of epinions.com (Richardson et al. . 2003) 

Flickr photo sharing social network (Kumar et al.. 2006) 
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Bloa-to-bloa citation network (6 months of data) (Leskovec et al.. 2007b) 
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Bloa-to-bloa citation network (1 vear of data) (Leskovec et al.. 2007b) 
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Citation network of ArXiv hep-th papers (Gehrke et al.. 2003) 
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Citations network of ArXiv hep-ph papers (Gehrke et al.. 2003) 



Collaboration networks 
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DBLP co-authorship network (Backstrom et al., 2006) 
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Co-authorship network in crr-crc cateaorv of ArXiv (Leskovec et al., 2005b) 
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Co-authorship network in hep-ph cateaorv of ArXiv (Leskovec et al.. 2005b) 
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Co-authorship network in hep-th cateaorv of ArXiv (Leskovec et al., 2005b) 



Web graphs 
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Web graph of Universitv of Notre Dame (Albert et al.. 1999) 
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AS graph from Newman (new, Julv 16, 2007) 
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AS from Oreaon Route View (Leskovec et al.. 2005b) 
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Gnutella P2P network on 3/25 2000 (Ripeanu et al.. 2002) 
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Gnutella P2P network on 3/30 2000 (Ripeanu et al.. 2002) 
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Affiliation network of gr-qc category in ArXiv (Leskovec et al., 2007b) 


Biological networks 
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Yeast protein interaction network (Colizza et al.. 2005) 
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Table 5: Network datasets we analyzed. Statistics of networks we consider: number of nodes N; number of edges E, number of nodes 
in largest connected component N c , fraction of nodes in largest connected component N c /N, average clustering coefficient C; 
diameter D, and average path length D. Networks are available for download at http : / /snap .Stanford . edu| 



